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Abstract 


We give polynomial time algorithms for quantitative (and qualitative) reachability analysis 
for Branching Markov Decision Processes (BMDPs). Specifically, given a BMDP, and given 
an initial population, where the objective of the controller is to maximize (or minimize) the 
probability of eventually reaching a population that contains an object of a desired (or undesired) 
type, we give algorithms for approximating the supremum (infimum) reachability probability, 
within desired precision e > 0, in time polynomial in the encoding size of the BMDP and in 
log(l/e). We furthermore give P-time algorithms for computing e-optimal strategies for both 
maximization and minimization of reachability probabilities. We also give P-time algorithms for 
all associated qualitative analysis problems, namely: deciding whether the optimal (supremum 
or infimum) reachability probabilities are 0 or 1. Prior to this paper, approximation of optimal 
reachability probabilities for BMDPs was not even known to be decidable. 

Our algorithms exploit the following basic fact: we show that for any BMDP, its maximum 
(minimum) non-reachability probabilities are given by the greatest fixed point (GFP) solution 
g* € [0,1]" of a corresponding monotone max (min) Probabilistic Polynomial System of equa¬ 
tions (max/minPPS), x = P{x), which are the Bellman optimality equations for a BMDP with 
non-reachability objectives. We show how to compute the GFP of max/minPPSs to desired 
precision in P-time. 

We also study more general branching simple stochastic games (BSSGs) with (non-)reachability 
objectives. We show that: (1) the value of these games is captured by the GFP, g*, of a cor¬ 
responding max-minPPS, x = P{x)\ (2) the quantitative problem of approximating the value is 
in TFNP; and (3) the qualitative problems associated with the value are all solvable in P-time. 

1 Introduction 

Multi-type branching processes (BPs) are infinite-state purely stochastic processes that model the 
stochastic evolution of a population of entities of distinct types. The BP specifies for every type a 
probability distribution for the offspring of entities of this type. Starting from an initial populatiom 
the process evolves from each generation to the next according to the probabilistic offspring rulesjj 
Branching processes are a fundamental stochastic model with applications in many areas: physics, 

^Branching processes are used both with discrete and with continuous time (where reproduction rules for each 
type have associated rates instead of probabilities). However, the probabilities of extinction and reachability are 
not time-dependent, and thus continuous-time processes can be studied via their corresponding discrete-time BPs, 
obtained by simply normalizing the rates on rules for each type. 
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biology, population genetics, medicine etc. Branching Markov Decision Processes (BMDPs) provide 
a natural extension of BPs where the evolution is not purely stochastic but can be potentially 
influenced or controlled to some extent: a controller can take actions which affect the probability 
distribution for the set of offspring of the entities of each type. The goal is to design a policy for 
choosing the actions in order to optimize a desired objective. 

In recent years there has been great progress in resolving algorithmic problems for BMDPs 
with the objective of maximizing or minimizing the extinction probability, i.e., the probability that 
the population eventually becomes extinct. Polynomial time algorithms were developed for both 
maximizing and minimizing BMDPs for qualitative analysis, i.e. to determine whether the optimal 
extinction probability is 0, 1 or in-between m, and for quantitative analysis, to compute the optimal 
extinction probabilities to any desired precision |12] . However, key problems related to optimizing 
BMDP reachability probabilities (the probability that the population eventually includes an entity 
having a target type) have remained open. 

Reachability objectives are very natural. Some types may be undesirable, in which case we want 
to avoid them to the extent possible. Or conversely, we may want to guide the process to reach 
certain desirable types. For example, branching processes have been used recently to model cancer 
tumor progression and multiple drug resistance of tumors due to multiple mutations ([H EQI EH])- 
It could be fruitful to model the introduction of multiple drugs (each of which controls/influences 
cells with a different type of mutation) via a “controller” that controls the offspring of different 
types, thus extending the current models (and associated software tools) which are based on BPs 
only, to controlled models based on BMDPs. A natural question one could ask then is to compute 
the minimum probability of reaching a had (malignant) cell type, and compute a drug introduction 
strategy that achieves (approximately) minimum probability. Doing this efficiently (in P-time) 
would avoid the combinatorial explosion of trying all possible combinations of drug therapies. 

In this paper we provide the first polynomial time algorithms for quantitative (and also qualita¬ 
tive) reachability analysis for BMDPs. Specifically, we provide algorithms for e-approximating the 
supremum probability, as well as the infimum probability, of reaching a given type (or a set of types) 
starting from an initial type (or an initial population of types), up to any desired additive error 
e > 0. We also give algorithms for computing e-optimal strategies which achieve such e-optimal 
values. The running time of these algorithms (in the standard Turing model of computation) is 
polynomial in both the encoding size of the BMDP and in log(i). We also give P-time algorithms 
for the qualitative problems: we determine whether the supremum or infimum probability is 1 (or 
0), and if so we actually compute an optimal strategy that achieves 1 (0, respectively). 

In prior work m, we studied the problem of optimizing extinction (a.k.a. termination) probabil¬ 
ities for BMDPs, and showed that the optimal extinction probabilities are captured by the least fixed 
point (LFP) solution q* G [0, !]”■ of a corresponding system of multivariate monotone probabilistic 
max (min) polynomial equations called maxPPSs (respectively minPPSs), which form the Bellman 
optimality equations for termination of a BMDP. A maxPPS is a system of equations x = P{x) 
over a vector x of variables, where the right-hand-side of each equation is of the form maxj{pj(x)}, 
where each Pj{x) is a polynomial with non-negative coefficients (including the constant term) that 
sum to at most 1 (such a polynomial is called probabilistic). A minPPS is defined similarly. In 
|12| . we introduced an algorithm, called Generalized Newton’s Method (GNM), for the solution of 
maxPPSs and minPPSs, and showed that it computes the LFP of maxPPS and minPPS (and hence 
also the optimal termination probabilities for BMDPs) to desired precision in P-time. GNM is an 
iterative algorithm (like Newton’s) which in each iteration solves a suitable linear program (different 
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ones for the max case and the min case). In |12| we also showed that for more general two player 
zero-sum branching simple stochastic games (BSSGs), with the player objectives of maximizing and 
minimizing the extinction probability, we can approximate the value of the BSSG extinction game 
in TFNP. 

In this paper we first model the reachability problem for a BMDP by an appropriate system of 
equations: We show that the optimal non-reachability probabilities for a given BMDP are captured 
by the greatest fixed point (GFP), g* G [0,1]” of a corresponding maxPPS (or minPPS) system of 
Bellman equations. We then show that one can approximate the GFP solution g* € [0,1]” of a 
maxPPS (or minPPS), x = P{x), in time polynomial in both the encoding size |P| of the^stem 
of equations and in log(l/e), where e > 0 is the desired additive error bound of the solutiono (The 
model of computation is the standard Turing machine model.) We also show that the qualitative 
analysis of determining the coordinates of the GFP that are 0 and 1, can be done in P-time (and 
hence the same holds for the optimal reachability probabilities of BMDPs). 

More generally, we study branching simple stochastic games (BSSGs) with (non-)reachability 
objectives. These are two player zero-sum turn based stochastic games, where one player wishes to 
reach a target type while the other player wants to avoid that. These games generalize BPs and 
BMDPs. Such games can potentially be used to model adversarially some unknown parts of the 
controlled stochastic model. For example, in the setting suggested above for modeling injection of 
different drugs in cancer tumors, there could be some cell types whose offspring generation behavior 
in the presence of the drugs is unknown, and these cell types could be modeled in a worst-case 
fashion as types in the BSSG that are controlled by the adversary, where the adversary aims to 
maximize the probability of reaching the bad (malignant) cell types, whereas the controller wants a 
drug injection strategy for the controllable cell types in order to minimize this probability. 

We show that, firstly, the value of BSSG (non-)reachability games (the value exists, i.e., these 
games are determined) is captured by the GFP, g*, of a corresponding max-minPPS, x = P{x). A 
max-minPPS is a system of equations Xi = Pi{x), where Pi{x) has either the form maxj{pj(x)} or 
the form minj{pj(a:)}, where Pj{x) are probabilistic polynomials. We show that the quantitative 
problem of approximating the value of a BSSG, or equivalently the GFP of a max-minPPS, is in 
TFNP. We also show that the qualitative problems associated with deciding whether the value of 
a BSSG is 0 or 1 (as well as computing optimal strategies that “achieve” these values if one or the 
other is the case) are all solvable in polynomial time. This should be contrasted with a result in |14) 
which shows that, for a given BSSG extinction game, the qualitative problem of deciding whether 
the value is equal to 1 El is at least as hard as Condon’s long standing open problem of computing 
the value of finite state simple stochastic games (or deciding whether this value is, say, > 1/2). 

Our P-time algorithms for computing the GFP of minPPSs and maxPPSs to desired precision 
make use of a variant of Generalized Newton Method (GNM), adapted for the computation of the 
GFP instead of the LFP, with a key important difference in the preprocessing step before applying 

^It is worth mentioning that it follows already from results in m that the quantitative decision problem for the 
GFP of a PPS (or ma:x/minPPS) is PosSLP-hard. In other words, the problem of deciding whether g) > p, for a 
given probability p £ [0,1], where g* is the GFP of a given PPS, is PosSLP-hard. This follows immediately from the 
proof in |15| (Theorem 5.3) of the PosSLP-hardness of deciding whether q* > p, where q* is the LFP of a given PPS 
(equivalently, the termination probabilities of a given 1-exit RMC). The PPS constructed in that proof is “acyclic” 
and has a unique fixed point, and thus its LFP is equal to its GFP, i.e., q* = g*. Thus, we can not hope to obtain 
a P-time algorithm in the Turing model for deciding g* > p, for a given PPS (or max/minPPS), without a major 
breakthrough in the complexity of numerical computation. 

^Equivalently, the problem of deciding whether the value is 1 for the termination game on a 1-exit Recursive 
simple stochastic game (1-RSSG). 


3 



GNM. We first identify and remove only the variables that have value 1 in the GFP g* (we do not 
remove the variables with value 0, unlike the LFP case). We show that for maxPPSs, once these 
variables are removed, the remaining system with GFP g* < \ has a unique fixed point in [0, !]"■, 
hence the GFP is equal to the LFP; applying GNM starting from the all-0 initial vector converges 
quickly (in P-time, with suitable rounding) to the GFP (by |12)). For minPPSs, even after the 
removal of the variables Xi with g* = 1, the remaining system may have multiple fixed points, and 
we can have LFP < GFP. Nevertheless, we show that with the subtle change in the preprocessing 
step, GNM, starting at the all-0 vector, remarkably “skips over” the LFP and converges to the GFP 
solution g*, in P-time. 

We note incidentally that for any monotone operator P from [0,1]" to itself, one can define 
another monotone operator R : [0,1]” —>■ [0,1]"', where R{y) = 1 — ^(1 — y), such that the GFP g* 
of X = P{x) and the LFP r* of y = R{y) satisfy y* = 1 — r*. (The second system is obtained from 
the first by the change of variables y = 1 — x.) Simple value iteration starting at 0 ( 1 ) on P{x) 
corresponds 1-to-l to value iteration starting at 1 ( 0 , respectively) on R{y). However, this does 
not imply that computing the GFP of a max/minPPS is P-time reducible to computing the LFP 
of a max/minPPS: even if x = P{x) is a PPS, the polynomials of R{y) in general have negative 
coefficients. Value iteration on R provably can converge exponentially slowly (starting at 0 or 1 ). 
Moreover, naively applying Newton starting at 0 to y = R{y) can fail because the Jacobians are no 
longer non-negative, and the iterates need not even be defined (even after qualitative preprocessing). 

Comparing the properties of the LFP and GFP of max/minPPSs, we note that a difference for 
the qualitative problems is that for the GFP, both the value=0 and the value=l question depend 
only on the structure of the model and not on its probabilities (the values of the coefficients), 
whereas in the LFP case the value=l question depends on the probabilities while value=0 does not 
(see [m [14]). 

It is also worth noting that for BMDPs and BSSGs there is a natural “duality” between the ob¬ 
jectives of optimizing reachability probability and that of optimizing extinction probability. Namely, 
we can view a BMDP or BSSG as a random/controlled process that generates a node-labeled (not 
necessarily finite) tree. The objective of optimizing the extinction probability (i.e., the probability 
of generating a finite tree), starting from a given type, can equivalently be rephrased as a universal 
reachabilit'j/'' objective on a slightly modified BMDP, where the goal is to optimize the probability 
of eventually reaching the target type (namely “death”) on all paths starting at the root of the tree. 
Likewise, the “universal reachability” objective for any BMDP can equivalently be rephrases as the 
objective of optimizing extinction probability on a slightly modified BMDP. (We will explain these 
in more detail in Section [2j) By contrast, the reachability objective that we study in this paper 
is precisely the ^''existential reachabilit'tj’’ objective for BMDPs and BSSGs, namely optimizing the 
probability of reaching the target type on some path in the generated tree. 

We shall see that, despite this duality, there are some important differences between these two 
objectives, in particular when it comes to the existence of optimal strategies. Namely, we show that, 
unlike optimization of extinction (termination) probabilities for BMDPs, for which there always 
exists a static deterministic optimal strategy (dl), there need not exist any optimal strategy at 
all for maximizing reachability probability in a BMDP, i.e., the supremum probability may not be 
attainable. If the supremum probability is 1 however (and likewise if the value of the BSSG game 
is 1), we show that there does exist a strategy (for the player maximizing reachability probability) 
that achieves it, although not necessarily any static strategy. For the objective of minimizing 
reachability probability, we show there always exists an optimal deterministic and static strategy. 
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both in BMDPs and BSSGs. Regardless of what the optimal value is, we show that we can compute 
in P-time an e-optimal static (possibly randomized) policy, for both maximizing and minimizing 
reachability probability in a BMDP. 

Related work: BMDPs have been previously studied in both operations research (e.g., |191 
mw and computer science (e.g., [I2E1II3]). We have already mentioned the results in mm 
concerning the computation of the extinction probabilities of BMDPs and the computation of the 
LFP of max/minPPS. Branching processes are closely connected to stochastic context-free gram¬ 
mars, 1-exit Recursive Markov chains (1-RMC) |15| . and the corresponding stateless probabilistic 
pushdown processes, pBPA |10) : their extinction or termination probabilities are interreducible, and 
they are all captured by the LFP of PPSs. The same is true for their controlled extensions, for 
example the extinction probability of BMDPs and the termination probabilities of 1-exit Recursive 
Markov Decision processes (1-RMDP) |14) . are both captured by the LFP of maxPPS or minPPS. 
A different type of objective of optimizing the total expected reward for 1-RMDPs (and equivalently 
BMDPs) in a setting with positive rewards was studied in |13| : in this case the optimal values are 
rational and can be computed exactly in P-time. 

The equivalence between BMDPs and 1-RMDPs however does not carry over to the reachability 
objective. The qualitative reachability problem for 1-RMDPs (equivalently BPA MDPs) and the 
extension to simple 2-person games 1-RSSGs (BPA games) were studied in ^ and |3] by Brazdil 
et al. It is shown in |3] that qualitative almost-sure reachability for 1-RMDPs can be decided in 
P-time (both for maximizing and minimizing 1-RMDPs). However, for maximizing reachability 
probability, almost-sure and limit-sure reachability are not the same: in other words, the supremum 
reachability probability can be 1, but it may not be achieved by any strategy for the 1-RMDP. By 
contrast, for BMDPs we show that if the supremum reachability probability is 1, then there is a 
strategy that achieves it. This is one illustration of the fact that the equivalence between 1-RMDP 
and BMDP does not hold for the reachability objective. The papers do not address the limit- 
sure reachability problem, and in fact even the decidability of limit-sure reachability for 1-RMDPs 
remains open. 

Ghen et. al. [6] studied model checking of branching processes with respect to properties ex¬ 
pressed by deterministic parity tree automata and showed that the qualitative problem is in P (hence 
this holds in particular for reachability probability in BPs), and that the quantitative problem of 
comparing the probability with a rational is in PSPACE. Although not explicitly stated there, one 
can use Lemma 20 of [6] and our algorithm from El to show that the reachability probabilities 
of BPs can be approximated in P-time. Bonnet et. al. j2] studied a model of “probabilistic Ba¬ 
sic Parallel Processes”, which are syntactically close to Branching processes, except reproduction 
is asynchronous and the entity that reproduces in each step is chosen randomly (or by a sched¬ 
uler/controller). None of the previous results have a bearing on the reachability problems for 
BMDPs. 

Organization of the paper: Section [2] gives basic definitions and background. Section [3] 
characterizes the (non-)reachability problem for BMDPs, and more general BSSGs, in terms of 
the GFP computation problem for max-minPPS equations, and discusses the existence of optimal 
strategies for BMDPs. Section 2] gives a P-time algorithm for determining those variables with 
value = 1 in the GFP of a max-minPPS. Section [5] analyses the GFP of PPSs, and shows we can 
approximate it in P-time. Section [6] solves the GFP value approximation problem for maxPPSs in 
P-time, and also shows how to compute an e-optimal deterministic static strategy for maxPPS in 
P-time. Section [7] solves the GFP value approximation problem for minPPSs in P-time. Section 
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[8] concerns the construction, in P-time, of e-optimal strategies for the GFP of a minPPS (this is 
substantially harder than the maxPPS case). Section [9] gives a P-time algorithm for determining 
those variables with value = 0 in the GFP of a max-minPPS (this is substantially harder than 
the = 1 case done in Section 0]). Section flOl shows that we can approximate the value of a BSSG 
(non-)reachabihty game, and the GFP of a max-minPPS, in TFNP. 

2 Definitions and Background 

We start by providing unified definitions of multi-type Branching processes (BPs), Branching MDPs 
(BMDPs), and Branching Simple Stochastic Games (BSSGs). Although most of our results are 
focused on BMDPs, since BSSGs provide the most general of these models we start by defining 
BSSGs, and then specializing them to obtain BMDPs and BPs. Throughout we use 0 and 1 to 
denote all-0 and all-1 vectors, respectively, of the appropriate dimensions. 

A Branching Simple Stochastic Game (BSSG), consists of a finite set V = {Ti,... ,T„} of 
types, a finite non-empty set Aj C S of actions for each type Ti (S is some finite action alphabet), 
and a finite set R{Ti, a) of probabilistic rules associated with each pair {Ti,a), i G [n], where a £ Ai. 
Each rule r G R{Ti,a) is a triple {Ti,pr,ar), which we denote by Ti ^ where ^ N” is a 
n-vector of natural numbers that denotes a finite multi-set over the set V, and where Pr G (0,1] flQ 
is the probability of the rule r (which we assume is given by a rational number, for computational 
purposes), where we assume that for all z G D and a G Aj, the rule probabilities in R{Ti,a) sum 
to 1, i.e., J2r€R{Ti a)Pr — BSSGs, the types are partitioned into two sets: V = Wnax U Emim 

14iax n Emin = 0, where Enax Contains those types “belonging” to player max, and Enin containing 
those belonging to player min. 

A Branching Markov Decision Process (BMDP) is a BSSG where one of the two sets Emax 
or Emin is empty. Intuitively, a BMDP (BSSG) describes the stochastic evolution of a population 
of entities of different types in the presence of a controller (or two players) that can influence 
the evolution. We can define a multi-type Branching Process (BP), by imposing a further 
restriction, namely that all action sets Aj must be singleton sets. Hence in a BP, players have no 
choice of actions, and we can simply assume players don’t exist: a BP defines a purely stochastic 
process. 

A play (or trajectory) of a BSSG operates as follows: starting from an initial population (i.e., 
set of entities of given types) Xq at time (generation) 0, a sequence of populations Xi,X 2 , ... is 
generated, where is obtained from X^ as follows. Player max (min) selects for each entity e 

in set Xk that belongs to max (to min, respectively) an available action a £ Ai for the type R of 
entity e; then for each such entity e in X^ a rule r £ R{Ti, a) is chosen randomly and independently 
according to the rule probabilities pr, where a G Aj is the action selected for that particular entity 
e. Every entity is then replaced by a set of entities with the types specified by the right-hand side 
multiset a,, of that chosen rule r. The process is repeated as long as the current population Aj. is 
nonempty, and it is said to terminate (or become extinct) if there is some k > 0 such that X^ = 0. 
When there are n types, we can view a population Aj as a vector Aj G N”, specifying the number 
of objects of each type. We say that the process reaches a type Tj, if there is some /c > 0 such that 

(-Afc)j > 0. 

We can consider different objectives by the players. For example, in [a in] the objective 
considered was that the two players wish to maximize and minimize, respectively, the probability of 
termination (i.e., extinction of the population). It was shown in |I4) that such BSSG games indeed 
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have a value, and in |12| a P-time algorithm was developed for approximating this value in the case 
of max-BMDPs and min-BMDPs with the termination objective. 

In this paper we consider the reachability objective: namely where the goal of the two players, 
starting from a given population, is to maximize/minimize the probability of reaching a population 
which contains at least one entity of a given special type, Tf*. It is perhaps not immediately clear 
that a BSSG with such a reachability objective has a value, but we shall show that this is indeed 
the case. 

Regarding strategies, at each stage, k, each player is allowed, in principle, to select the actions 
for the entities in that belong to it based on the whole past history, may use randomization 
(a mixed strategy), and may make different choices for entities of the same type. The “history” 
of the process up to time k — 1 includes not only the populations Xq,Xi, ... ,Xk-i, but also the 
information on all the past actions and rules applied and the parent-child relationships between all 
the entities up to the generation Xk-i. The history can be represented by a forest of depth k — 1, 
with internal nodes labelled by rules and actions, and whose leaves at level k — 1 form the population 
Xk-i. Thus, a strategy of a player is a function that maps every finite history (i.e., labelled forest 
of some finite depth as above) to a probability distribution on the set of tuples of actions for the 
entities in the current population (i.e. at the bottom level of the forest) that are controlled by that 
player. Let Ti, T 2 be the set of all strategies of players 1, 2. We say that a strategy is deterministic 
if for every history it chooses one tuple of actions with probability 1. We say that a strategy is 
static if for each type T, controlled by that player the strategy always chooses the same action Oj, or 
the same probability distribution on actions, for all entities of type Tj in all histories 0 Our notion 
of an arbitrary strategy is quite general (it can depend on all the details of the entire history, and 
be randomized, etc.). However, it was shown in m that for the objective of optimizing extinction 
probability, both players have optimal static strategies in BSSGs. We shall see that this is not the 
case for BMDPs or BSSGs with the reachability objective. 

Let us now observe, as mentioned in the Introduction, a natural “duality” between the objective of 
optimizing extinction probability and that of optimizing reachability probability. A BMDP or BSSG 
can also be viewed as a random/controlled process for generating a node-labeled, not necessarily 
hnite, tree (or a forest, in case the process is started with a population larger than 1). The nodes 
of the tree denote objects, nodes are labeled by their type, and the edges in the tree denote the 
parent-child relationships: when a rule Tj ^ is applied to some node v of type Tj in the tree, 
the children of node v will be in 1-1 correspondence with the multi-set of types given by For a 
given BSSG, optimizing the extinction probability (i.e., the probability of generating a finite tree), 
starting from an object of a given type, can be rephrased as a universal reachabilit'i/’ objective on 
a slightly modified BSSG, where the objective is to optimize the probability of eventually reaching 
a target type on all paths starting at the root of the generated tree. Specihcally, the target type is 
a newly introduced type, called death, and for all types Tj, every rule Tj —)• 0 in the original BSSG 
is replaced by the rule Tj —)■ death in the modihed BSSG (with the same probability). Likewise, 
the “universal reachability” objective for any BSSG can be rephrased as the objective of optimizing 
extinction probability in a slightly modihed BSSG. Namely, for all types Tj, every rule Tj —)• Ur 
in the original BSSG, where the multiset is nonempty, is replaced by the rule Ti —>■ a), (with 
the same probability) in the revised BSSG, where the multiset is the same as except that all 

^In |12| we called a strategy “static” if it was both deterministic and static. In this paper we will refer to these as 
“deterministic static” strategies, because we will also need “randomized static” strategies, and want to differentiate 
between them. 
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copies of the target type have been removed from moreover for any non-target type Tj, a rule in 
the original BSSG of the form Tj —)■ 0 is replaced by the rule Tj —?• dead (with the same probability) 
in the revised BSSG, where dead is a new type having only one associated rule: dead —> dead, with 
probability 1. 

By contrast, the reachability problem that we study in this paper is precisely the ^‘existential 
reachabiliti/’ objective for BMDPs, namely optimizing the probability of reaching the target type 
on some path in the generated tree. 

Let us now consider in more detail the (non-)reachability objective. For a given initial population 
/i G N"", with {yL)f* = 0, and given integer k > 0, and strategies a G dl'i, r G '^ 2 ) we denote by 
the probability that the process with initial population /r, and strategies a, r does not reach 
a population with an object of type Tj* in at most k steps. In other words, this is the probability 
that for all 0 < d < A:, we have = 0. Let us denote by g*^{ij,) the probability that = 0 

for all d > 0. 

We let g^{n) = sup^g^^ infre^'a 9a,Tih)^ and g*{g,) = sup^g^^ infrG^a 5'a,r(M); the last quantity 
is the value of the non-reachability game for the initial population g,. Likewise g^{g) is the value 
of the A:-step non-reachability game. We will show that determinacy holds for these games, i.e. 
g*{g) = sup^g^^ infre^a 9%,Ah) = infrg'l'a sup^g,^^ 9l,Ah)^ and similarly for g^{g). However, unlike 
the case for extinction probabilities (CD), it does not hold that both players have optimal static 
strategies. 

If g has a single entity of type Tj, we will write g* and g^ instead of g*{g) and g^{g)- 

Given a BMDP (or BSSG), the goal is to compute the vector g* of the g*'s, i.e. the vector 
of non-reachability values of the different types. As we will see, from the ( 7 *’s, we can compute 
the value g*{g) for any initial population g, namely g*{g) = f{g*,g) ■= Ili{gA^'■ The vector of 
reachability values r* is of course r* = 1 — g*, where 1 is the all-1 vector; the reachability value for 
initial population g is r*{g) = 1 — g*{g). 

We shall associate a system of min/max probabilistic polynomial Bellman equations, x = P{x), 
to each given BMDP or BSSG, that contains one variable Xj and one equation Xj = Pj(x) for 
each type Tj, such that the vector g* of values of the BSSG non-reachability game for the different 
starting types is given by the greatest fixed point (GPP) solution of x = T’(x) in [0,1]”". We need 
some notation first in order to introduce these Bellman equations. 

For an n-vector of variables x = (xi,..., x„), and a vector v G N”’, we use the shorthand notation 
x'" to denote the monomial x^ ... xjj". Let (a^ € | r G i?) be a finite set of n-vectors of natural 

numbers, indexed by the set R. Gonsider a multi-variate polynomial Tj(x) = ^ some 

rational-valued coefficients pr, r £ R. We shall call T’j(x) a probabilistic polynomial if > 0 for 
all r £ R, and < 1- 

Definition 2.1. A probabilistic polynomial system of equations, x = P{x), which we shall 
call a PPS, is a system of n equations, Xj = Pi{x), in n variables x = (xi,X 2 ,... ,Xn), where for 
all i £ {1, 2,... , n}, Pj(x) is a probabilistic polynomial. 

A maximum-minimum probabilistic polynomial system of equations, x = P{x), called 
a max-minPPS is a system of n equations in n variables x = (xi, X 2 , • • •, x^), where for all 
i £ {1,2,, n}, either: 

• Max-polynomial.- Pj(x) = max{gjj(x) : j G {1,... ,mj}}. Or: 

• Min-polynomial.- Pj(x) = min{(/jj(x) : j G {1,... ,mj}} 
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where each qij{x) is a probabilistic polynomial, for every j G {1,... ,mi}. 

We shall call such a system a maxPPS (respectively, a minPPSJ if for every i G {1,... ,n}, 
Pi{x) is a Max-polynomial (respectively, a Min-polynomialj. 

Note that we can view a PPS in n variables as a maxPPS, or as a minPPS, where m* = 1 for 
every i G {1,... , re}. 

For computational purposes we assume that all the coefficients are rational. We assume that 
the polynomials in a system are given in sparse form, i.e., by listing only the nonzero terms, with 
the coefficient and the nonzero exponents of each term given in binary. We let \P\ denote the total 
bit encoding length of a system x = P{x) under this representation. 

We use max/minPPS to refer to a system of equations, x = P{x), that is either a maxPPS 
or a minPPS. We refer to systems of equations containing both max and min equations as max- 
minPPSs. 

It was shown in |14) that any max-minPPS, x = P{x), has a least fixed point (LFP) solution, 
q* G [0,1]"', i.e., q* = P{q*) and if g = P{q) for some q G [0,1]" then q* < q (coordinate-wise 
inequality). In fact, q* corresponds to the vector of values of a corresponding Branching Simple 
Stochastic Game with the objective of extinction, starting at each type. As observed in dani], 
q* may in general contain irrational values, even in the case of pure PPSs (and the corresponding 
multi-type Branching process). 

In this paper we shall observe that any max-minPPS, x = P{x), also has a greatest fixed 
point (GFP) solution, g* G [0,1]", i.e., such that g* = P{g*) and if q = P{q) for some q G [0,1]" 
then q < g* (coordinate-wise inequality). In fact, in this case g* corresponds to the vector of values 
of a corresponding branching simple stochastic game where the objective of the two players is to 
maximize/minimize the probability of not reaching an undesired type (or set of types) starting at 
each type. Again, g* may contain irrational coordinates, so we in general want to approximate 
its coordinates (and the coordinates of (1 — g*) which constitute reachability values) to desired 
precision. For a countable set S, let A (S') denote the set of probability distributions on S, i.e., the 
set of functions / : S ^ [0,1] such that Yls&s ~ 

Definition 2.2. We define a (possibly randomized ) policy for max (min) in a max-minPPS, x = 
P{x), to be a function a : {1,..., re} —> A(N) that assigns a probability distribution to each variable 
Xi for which Pi{x) is a max- (respectively, min-) polynomial, such that the support of a{i) is a 
subset o/{l,... ,mi}, the possible mi = \Ai\ different actions (i.e., choices of polynomials) available 
in Pi{x). 

Intuitively, policies are akin to static strategies for BMDPs and BSSGs. For each variable, Xi, 
a policy selects a probability distribution over the probabilistic polynomials, qi^a{i){x), that appear 
on the RHS of the equation Xi = Pi{x), and which Pi{x) is the maximum/minimum over. 

Definition 2.3. For a max-minPPS, x = P{x), and polieies a and r for the max and min players, 
respectively, we write x = Pa^r{x) for the PPS obtained by fixing both these policies. We write 
X = Pa,*{x) for the minPPS obtained by fixing a for the max player, and x = P^.^r{x) for the 
maxPPS obtained by fixing r for the min player. More specifically, note that for policy a for player 
max, we define the minPPS x = Pa,*{x) by {Pa,*)i{x) ■= YlaeA ' 9*,a? belong to 

player max, and otherwise {Pa,*)iix) ■= Pi{x). We similarly define x = P^.^r{x) and x = Pa^rix). 

For a maxPPS (or minPPS), x = P{x), and policy a for max (for min}, we shall use the 
abbreviated notation x = P^ix) instead of x = Pa,*{x) (instead of x = P*^o-(x), respectively). 
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For a max-minPPS, x = P{x), and a (possibly randomized) policy, a for max, we use q* 
and to denote the LFP and GFP solution vectors for the corresponding minPPS x = 
respectively. Likewise we use ql ^ and gl ^ to define the LFP and GFP solutions of the maxPPS 
X = Pi:^r{x). Similarly, for a maxPPS (or minPPS), x = P{x), and a policy, a, we use q* and g* 
to denote the LFP and GFP of x = Pa{x). 

Definition 2.4. For a max-minPPS, x = P{x), a policy a* is called optimal for max for the LFP 
(respectively, the GFP) if q** ^ = q* (respectively g** ^ = g*). 

An optimal policy t* for min for the LFP and GFP, respectively, is defined similarly. 

For e > 0, a policy a' for max is called e-optimal for the LFP (respectively GFP) , if \\qf, ^ — 
9*||cx> < e (respectively, \\gfi^—g*\\ooSi^)- An e-optimal policy t' for min is defined similarly. 

It is convenient to put max-minPPSs in the following simple form. 

Definition 2.5. A max-minPPS in simple normal form (SNF), x = P{x), is a system of n 
equations in n variables xi,X 2 ,... ,Xn where each Pi{x) for i = 1,2,... ,n is in one of three forms: 

• Form L.- Pi{x) = where Oij > 0 for all j, and such that ^2^=0 ^i,j — ^ 

• Form Q.’ Pi{x) = XjXk for some j, k 

• Form M.- Pi{x) = max{xj,Xfc} or Pi{x) = Ta.\n{xj,Xk}, for some j, k; 

we sometimes differentiate these two cases as Form Mmax ond Mmin, respectively. 

We define SNF form for max/minPPSs analogously: only the definition of “Form M” changes 
(restricting to max or min, respectively). 

In the setting of a max-minPPSs in SNF form, we will often say that a variable has form or type 
L, Q, or M, to mean that Pi{x) has the corresponding form. Also, for simplicity in notation, when 
we talk about a deterministic policy, if Pi{x) has form M, say Pi{x) = max{xj,Xk}, then when it 
is clear from the context we will use a{i) = k to mean that the policy a chooses Xk among the two 
choices Xj and x^ available in Pi{x) = max{xj,Xfc}. 

Proposition 2.6 (cf. Proposition 7.3 |15)). Every max-minPPS, x = P{x), can be transformed 
in P-time to an “equivalent’' max-minPPS, y = Q{y) in SNF form, such that |(5| G 0(|P|). More 
precisely, the variables x are a subset of the variables y, and both the LFP and GFP of x = P{x) are, 
respectively, the projection of the LFP and GFP of y = Q{y), onto the variables x, and furthermore 
an optimal policy (respectively, e-optimal policy) for the LFP (respectively, GFP) of x = P{x) can 
be obtained in P-time from an optimal (resp., e-optimal) policy for the LFP (respectively, GFP) of 

y = Q{y)- 

Proof. We can easily convert, in P-time, any max-minPPS into SNF form, using the following 
procedure. 

• For each equation xi = Pi{x) = max {pi{x),... ,Pm{x)}, for each Pj{x) on the right-hand-side 
that is not a variable, add a new variable Xk, replace Pj{x) with Xk in Pi{x), and add the new 
equation Xk = Pj{x). Do similarly if Pi{x) = min{pi(x),... ,pm{x)}. 

• If Pi{x) = max {xjj,..., Xj^} with m > 2, then add m — 2 new variables xq,..., Xi^_^, set 

Pi{x) = max and add the equations Xjj = max {xj 2 ,Xi 2 }, Xi^ = max {xj^,Xi^}, ..., 

Xi „,_2 = max Do similarly if Pi(x) = min{xj.^^,ldots, Xj^} with m > 2. 
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• For each equation Xi = Pi{x) = YlJLi where Pi{x) is a probabilistic polynomial that 

is not just a constant or a single monomial, replace every (non-constant) monomial on 
the right-hand-side that is not a single variable by a new variable Xi^ and add the equation 

Xi ^ . 

‘'3 

• For each variable Xi that occurs in some polynomial with exponent higher than 1, introduce 

new variables xq,..., where k is the logarithm of the highest exponent of Xj that occurs in 
P(x), and add equations xq = x?, xq = ■ ■ ■■, Xi^ = x‘f^_^. For every occurrence of a higher 

power x(, Z > 1, of x* in P{x), if the binary representation of the exponent I is Ofc ... a 2 aiao, 
then we replace x[ by the product of the variables xq such that the corresponding bit Uj is 
1, and Xj if ao = 1. After we perform this replacement for all the higher powers of all the 
variables, every polynomial of total degree >2 is just a product of variables. 

• If a polynomial Pi{x) = xq ■ ■ ■ Xj^ in the current system is the product of m > 2 variables, 
then add m — 2 new variables xq,... ,Xi^_ 2 , set Pi{x) = Xj^Xi^, and add the equations 

/y» . /y» , /y* . /y> . /y> . /y> . /y» , - rv* • . 

■^11 -^*2 ■ ■ ■! ■^* m -2 

Now all equations are of the form L, Q, or M. 

The above procedure allows us to convert any max-minPPS into one in SNF form by introducing 
0(|P|) new variables and blowing up the size of P by a constant factor 0(1). It is clear that both 
the LFP and the GFP of x = P{x) arise as the projections of the LFP and GFP of 7/ = Q{y) onto 
the X variables. Furthermore, there is an obvious (and easy to compute) bijection between policies 
for the resulting SNF form max-minPPS and the original max-minPPS. □ 

Thus from now on, and for the rest of this paper we may assume, without loss of generality, that 
all max-minPPSs are in SNF normal form. 

A non-trivial fact established in m is that for the LFP of a max-minPPS, both players always 
have an optimal deterministic policy: 

Theorem 2.7 f[14). Theorem 2). For any max-minPPS, x = P{x), for both the maximizing and 
minimizing player there always exists an optimal deterministic poliey, for the LFP. 

As we shall show, while in general for a max-minPPS x = P(x) there does exist an optimal 
deterministic policy cr* for the maximizing player, for the GFP, in general there does not exist any 
optimal policy at all for the minimizing player for the GFP of a minPPS x = P{x). 

Nevertheless, we shall show that for any e > 0, there always exists an e-optimal randomized 
policy for the GFP for the minimizing player in any max-minPPS. Furthermore, we shall show how 
to compute such a policy in P-time for minPPS. 

Definition 2.8. The dependency graph of a max-min PPS x = P{x), is a direeted graph that has 
one node for each variable xq and contains an edge {xi,Xj) if Xj appears in Pi{x). The dependency 
graph of a BSSG has one node for each type, and contains an edge {Ti,Tj) if there is an aetion 
a € Ai and a rule Ti ^ Or in R(Ti,a) such that Tj appears in 

2.1 Generalized Newton’s Method 

The problem of approximating efficiently the LFP of a PPS was solved in m, by using Newton’s 
method (combined with suitable rounding), applied after elimination of the variables with LFP 
value 0 and 1. We hrst recall the definition of Newton iteration for PPSs. 
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Definition 2.9. For a PPS x = P{x) we use B{x) to denote the Jaeobian matrix of partial deriva¬ 
tives of P{x), i.e., B{x)ij := For a point x € M”, if (/ — B{x)) is non-singular, then we 

define one Newton iteration at x via the operator: 

N{x) = X + (/ — B{x))~^{P{x) — x) 

Given a max/minPPS, x = P{x), and a poliey a, we use Afaix) to denote the Newton operator of 
the PPS x = P(j{x); i.e., letting B(j{x) denote the Jaeobian of P„{x), if {I — Ba{x)) is non-singular 
at a point x G M"", then Na{x) = x + (/ — Ba{x))~^{Pa{x) — x). 

Definition 2.10. For a max/minPPS, x = P{x), with n variables (in SNF form), the linearization 
of P{x) at a point y G M”, is a system of max/min linear functions denoted by P^(x), whieh has 
the following form: 

if Pi(x) has form L or M, then Pf{x) = Pi{x), and 

if Pi{x) has form Q, i.e., Pi{x) = XjXk for some j,k, then 

Pf{x) = VjXk + XjVk - VjVk 

We can consider the linearization of a PPS, x = Pa{x), obtained as the result of fixing a policy, 
a, for a max/minPPS, x = P{x). 

Definition 2.11. Pa{x) := {Pa)^{x). 

Note than the linearization P^(x) only changes equations of form Q, and using a policy a only 
changes equations of form M, so these operations are independent in terms of the effects they have 
on the underlying equations, and thus Pa{x) = {Pa)'^{x) = (P^)o-(x). 

We now recall and adapt from m the definition of distinct iteration operators for a maxPPS 
and a minPPS, both of which we shall refer to with the overloaded notation I{x). These operators 
serve as the basis for Generalized Newton’s Method (GNM) to be applied to maxPPSs and minPPSs, 
respectively. We need to slightly adapt the definition of operator /(x), specifying the conditions on 
the GFP g* under which the operator is well-defined: 

Definition 2.12. For a maxPPS, x = P{x), with GFP g*, sueh that Q < g* < \, and for a real 
vector y such that 0 < y < g*, we define the operator I{y) to be the unique optimal solution, a G M"", 
to the following mathematical program: Minimize: i Subject to: P'^{a)<a. 

For a minPPS, x = P{x), with GFP g*, such that 0 < < 1, and for a real vector y such that 

0 ^ y ^ 9*, we define the operator I{y) to be the unique optimal solution a G K” to the following 
mathematical program: Maximize: > Subject to: py{a)>a. 

In both cases, the mathematical programs can be solved using Linear Programming. In the case of a 
maxPPS, the constraint Pi{a) < a* for each variable Xj of form L or Q is linear, and the constraint 
for a variable Xj of form M with Pi{x) = max(xj, x^) can be replaced by the two inequalities Oj < a* 
and Ufc < Oj. Similarly, in the case of a minPPS, the constraints for variables of form L and Q 
are linear, and the constraint Pf{a) > Oj for a variable Xj of form M with Pi{x) = min(xj,Xfc) 
can be replaced by the two inequalities aj > a* and Ok > ai. A priori, it is not clear whether the 
mathematical programs have a unique solution, and hence whether the above “definitions” of I{x) 
for maxPPSs and minPPSs are well-defined. We will see that they are (again, adapting facts for 
GNM applied to LFP computation from |12|f . 
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We require a rounded version of GNM, defined in |12| as follows. 

GNM, with rounding parameter h: Starting at := 0, For k > 0, compute from 

x^^^ as follows: first calculate I(x^^^), then for each coordinate i = 1,2, set to be 

the maximum (non-negative) multiple of 2~^ which is < max{0,(In other words, round 
I{x^^^) down to the nearest 2~^ and ensure it is non-negative.) 

3 Greatest Fixed Points capture non-reachability values 

For any given BSSG, Q, with a specified special target type Tf*, we will construct a max-minPPS, 
X = P{x), and show that the vector g* of non-reachability values for [Q, Tf*) is precisely the greatest 
fixed point g* € [0,1]” of x = P{x). 

The system x = P{x) will have one variable Xj and one equation Xi = Pfix), for each type 
Tj 7 ^ Tf*- For each i /*, the min/max probabilistic polynomial Pfix) is constructed as follows. 
For all j € Aj, let R'{Ti,j) := {r G R{Ti,j) : {ar)f* = 0} denote the set of rules for type Tj 
and action j that generate a multiset not containing any element of type Tf*. Pfix) contains 
one probabilistic polynomial qij{x) for each action j € Ai, with qij{x) = YlreR'{Ti j) 
particular, note that we do not include, in the sum that defines qij{x), any monomial p^/x"'-' 
associated with a rule r' which generates at least one object of the special type Tf*. Then, if 
type Ti belongs to the max player, who aims to minimize the probability of not reaching an object 
of type Tf*, we define Pfix) = min^g^. gjj(x). Likewise, if type Tj belongs to the min player, 
whose aim is to maximize the probability of not reaching an object of type Tf*, then we define 
Pi{x) = maXjgA, qi,jix)- 

Note the swapped roles that max and min play in the equations, versus the goal of the cor¬ 
responding player in terms of the reachability objective. This swap is necessary because, whereas 
the objectives of the players are to maximize or minimize reachability probabilities, the equations 
we have constructed will capture, in their greatest fixed point (GFP) solution, the optimal non¬ 
reachability values g*. 

The following theorem, which is key, is analogous to a theorem proved in |14| which proves a 
similar relationship between the LFP of a max-minPPS and the extinction values of a BSSG: 

Theorem 3.1. The non-reachability value vector g* G [0,1]” of the BSSG is equal to the Greatest 
Fixed Point (GFP) of the operator P{-) in [0,1]”. Thus, g* = P{g*), and for all fixed points g' = 
Pid')) 9 ' ^ [0) 9 ' Si 9 * ■ Furthermore, for any initial population g, the optimal non-reachability 

values satisfy g*{g) = and g*{g) = sup^g^^ inf^g^j 5 *’^’^(p) = inf^g^j sup^g^^ c/*’'^’^(/r). 

In particular, such games are determined. 

Proof. Let x^ denote the A:-fold application of P on the all-1 vector, i.e. x*^ = 1, and x^ = P{x^~^) 
for k > 0. P{-) defines a monotone operator, P : [0,1]” —>■ [0,1]”, that maps [0,1]” to itself. Thus, 
the sequence x^ is (component-wise) monotonically non-increasing as a function of k, bounded from 
below by the all-0 vector, and thus by Tarski’s theorem it converges to the GFP, x* G [0,1]”, of the 
monotone operator P{-), as k ^ 00 . We will first show the following lemma. 

Lemma 3.2. For any integer k > 0 and any finite non-empty initial population g (expressed as an n- 
vector) which does not contain any element of type ofTf*, the value g^{g) := sup^-g^^ infT-g ^2 darih) 
of not reaching an element of type Tf* in k steps is g^{g) = f{x^,g) := _ Furthermore, 

there are strategies of the two players (in fact deterministic strategies), Cfc G Ti and Tk G ^ 2 , that 
achieve this value, i.e, g^{g) = inf^gvi/j ga^^Ah) = sup^g,i,^ 9a,rAA)- 
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Proof. We show the claim by induction on k. The basis, /c = 0, is trivial: namely we only have 
variables Xi for each type Tj 7 ^ Tf*. Thus, clearly starting with any finite non-empty population 
of objects of types Tj 7 ^ Tf* the (optimal) probability of not reaching an object of type Tf* within 
0 steps is 1. For the induction part, consider the generation of population Xi from Xq in step 
1. We show first that g^{ii) > f{x^,ij) := )(M)y Consider the following strategy for 

the max player (the player trying to maximize the probability of not reaching the type Tf*). For 
each entity in the initial population Xq = of a max type Ti, the max player selects in step 1 
(deterministically) an action a G Tj that maximizes the expression Y2reR'{Ti a)Prf{x^~^-,oir) on the 
right side of the equation x^ = Pi{x^~^). Once the min player also selects actions for the entities of 
min type in Xq, and rules for all the entities are chosen probabilistically to generate the population 
Xi for time 1, the max player thereafter follows an optimal (k — l)-step strategy cJfc-i starting from 
Xi. If we assume inductively that is deterministic, then cjfc is also deterministic. (It is not 

static however; the action chosen for an entity of a given type in a population Xi in the process 
may depend on the time i.) 

Let r be any strategy of the min player. Consider a combination of actions chosen with nonzero 
probability by the min player in step 1 for the entities of min type in Xq = /x. After this, a 
combination of rules is chosen randomly and independently for all the entities of g and the population 
Xi is generated accordingly with probability that is the product of the rule probabilities that were 
applied (because the rules are chosen independently). By the induction hypothesis, the value with 
which the population Xi does not reach a type Tf* in the next k — 1 steps (i.e. by time k) is 
^fc-i(j^i) _ f(^x^~^,Xi). If, for each possible set Xi (there are finitely many possibilities), we 
multiply f{x^~^,Xi) with the probability of the combination of rules that can be used in step 
1 to generate Xi from Xq, and we sum this over all possible Ai, we can write the result as a 
product of \fi\ terms, one for each entity in /x. The term for an entity of max or min type Tj is 
'^r&R'{T- a)Prf{x^~^: o^r), where a is the action selected for this entity by the min or max player in 
step 1. For the max player, we selected an action a G Aj that maximizes this expression, therefore 
the term for a max entity is equal to Pi{x^~^) = x^. 

For an entity that belongs to the min player, no matter which action the player chose, the term 
is greater than or equal to the minimum value over all available actions, which is Pi{x^~^) = x^. 
Hence, for any combination of actions chosen by the min player in step 1, the probability that the 
process does not reach an object of type Tf* by step k under the strategies r is at least /(x^, /x). 
Therefore, this holds also if r makes a randomized selection in step 1, i.e., assigns nonzero probability 
to more than one combinations of actions for the min entities in /x. Thus, infTgijfj > f{x^, fi) 

and hence g^{g) > f{x^,g). 

We can give a symmetric argument for the min player to prove the reverse inequality. De¬ 
fine strategy for the min player as follows. In step 1, the min player chooses for each en¬ 
tity of min type Tj in the initial population /x, an action a G Aj that minimizes the expression 
YlreR'{T- a) Pr f , otr) on the right side of the equation x^ = Pi{x^~^), and then, once the max 
player has chosen actions for the max entities of g, and rules are selected and applied to gen¬ 
erate the population Xi, the min player follows the optimal deterministic strategy T^-i starting 
from Xi (assumed to exist by induction). By a symmetric argument to the max player case, 
it is easy to see that sup^gvj,^ (t) < and hence g^{g) < f{x^,g). It follows that 

g^{g) = inf^evi,2 5afc,r(M) = sup^gg>^ 9a,Tkip) = f{x^^P)- 

□ 
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In particular, for singleton initial populations, the Lemma implies that for all types 

Ti^Tf*, and for all A: > 0. 

Let X* = Ymik^ooX^ denote the Greatest Fixed Point (GFP) of the equation x = P{x). We 
will show that for any initial population /r, the “value” g*{g) := sup^-g^j,^ infTgijij g^T-{fJ.) of not ever 
reaching a population containing an object of type Tf* satishes g*{n) = inf^giiij sup^-g^^ g*^{fj,) = 
f{x*,fi). In particular, these games are indeed determined. For singleton populations, this implies 
that g* = X* for all types Tj ^ Tj*. 

Since x^ converges to x* from above as A; —>■ oo, the sequence f{x^,g,) converges to f{x*,g,) from 
above. Thus, for every e > 0 there is a k{e) such that /(x*,^) < f{x^^^\g) < /(x*,/i) + e. 

From the proof of Lemma 13.21 the strategy rfc(g) of the min player (who is minimizing the 
probability of not reaching Tj* in k{e) rounds), satishes, for all strategies a G 'I'l, g*.j.^^^^{g,) < 

< sup^g,^^ (/.i) = f{x^^^\fj,) < /(x*,/i) + e. Since this holds for every e > 0, it 

follows that g*{fj,) = sup^g^y^ inf^evi/a <r(/^) < infrgvi/a sup^gv,y^ 5a,r(A^) < 

For the converse inequality, let a* be the static deterministic strategy for the max player (who 
is trying to maximize the probability of not reaching Tf*), which always chooses for each entity 
of max type Tj an action a (z Ai that maximizes the expression YlreR(Ti a) Pr f {x* , otr) ■ If we hx 
the actions for all the max types according to a*, the BSSG G becomes a minimizing BMDP G' 
where all the max types of G become now choice-less or “random” types (meaning that no choice 
is available to the max player: it has only one action it can take at every type that belongs to 
it). Let X = P'{x) be the set of equations for G'] for the min types Tj of G', the equation is the 
same, i.e., P/ = Pi] whereas for max types Ti the function on the right-hand side changes from 
Pi{x) = maXaeAiJ2reR{Ti,a)Prfix,Oir) to Pl{x) = Ylr€R{Ti,ai) Pr f (x, Oir), for some Specific action 
Oj G Ai- Thus, P'{x) < P(x) for all x G [0,1]”. Let , A: = 0,1,... be the vector resulting from the 
A;-fold application of the operator P' on the all-1 vector. Then < x^ for all k, and therefore the 
GFP y* of P' satishes y* < x*, where x* is the GFP of P. However, x* is a hxed point of P', since 
we have chosen actions for all the max types Tj that achieve the maximum in Pj(x*). Therefore, 
X* = y*, and both x* and y* are the GFP of both P' and P. 

Gonsider any hxed strategy r of the min player starting from initial population y. Applying 
Lemma r3.2l to the BMDP G', we know that for every k, the probability, using strategy r in G', of not 
reaching the type Tf* in k steps, starting in population y is at least f{y^,y). Therefore, the optimal 
(inhmum) probability of not reaching a type Tf* in any number of steps is at least limfc_j.oo /(y^, A^) = 
fiy*,p) = f{x*,fi). That is, inira^^gl, rip) P fix*,p)- Gombining with the previously established 
inequality, g*{fi) < f{x*,y), and since clearly g*iii) = sup^g^^ infog^j 5 *_^(/r) > infog^j 5 -*.^^(/i), 
we conclude that a* is actually an optimal (static) strategy for the player maximizing the non¬ 
reachability probability of Tf*, and that /(x*,y) = mir £^2 9a* rip) — supo-g,j,^ infT-g ^2 fl'CTr(y) = 
9* ip) = infrGV&2 SUp^g,;,^ 9a,T ip)- 

□ 

A direct corollary of the proof of Theorem 13.11 is that the player maximizing non-reachability prob¬ 
ability always has an optimal static strategy: 

Corollary 3.3. In any Branching Simple Stochastic Game, G, where the objective of the players is 
to maximize and minimize, respectively, the probability o/not reaching a type Tf*, the player trying 
to maximize this probability always has a deterministic static optimal strategy a*. 

In particular, for any max-minPPS, x = P’(x), with GFP g*, the max player has an optimal 
deterministic policy, a*, for the GFP, such that g* = g*t ^ (where, recall, g*, ^ is the GFP of 
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X = Pa*^*{x)). 


Proof. Just use the deterministic static optimal strategy cr* for the maximizing player defined in 
the proof of Theorem 13.1[ which for each type Tj controlled by the max player chooses an action 
a G Ai which maximizes the expression YlreR'{Ti a)Prfix*,Oir)- 

Clearly, this also implies the existence of a deterministic optimal policy, a*, for the max player, 
for the GFP g* = g*, ^ in any max-minPPS x = P{x). □ 

The same is not true for the player trying to minimize this non-reachability probability. In other 
words, the same is not true for the player trying to maximize the probability of reaching a type Tf*. 
This is illustrated by the following two examples: 

Example 3.1 (In general, there is no randomized static optimal strategy for maximizing the reach¬ 
ability probability in BMDPs, even when the supremum probability is 1.). Consider a BMDP with 
three types: {A,B,C}. Type C is the goal type (i.e., C = Tf*). The BMDP is described by the 
following rules for types A and B. The only controlled type is A. The type B is purely “random”. 
The symbol “0” denotes that one of the rules for type B generates, with probability 1/2, the empty 
set, containing no objects, from an object of type B. 


A 

A 

B 

B 


—> 

AA 


B 


C 

4 

0 


It is easy to see that for this BMDP, the controller who wishes to maximize the probability 
of reaching type C, starting with one object of type A, can do so with probability 1 — e, for any 
e > 0. The strategy for doing so is the following: first create sufficiently many copies of A, namely 
k = [log(l/e)] copies, by using the rule A —>■ AA. Then, for each of the created copies, choose the 
“lottery” B. Each “lottery” B will, independently, with 1/2 probability, reach C. This assures that 
the total probability of not reaching a C is ^ < e. 

Thus, the supremum value of reaching C in this BMDP is clearly 1. However, it is also easy 
to see that there is no randomized static optimal strategy that achieves this supremum value of 
1. This is because any randomized static strategy which places positive probability on the rule 
A ^ B would with positive probability p* bounded away from 0 go extinct starting from a bounded 
population of H’s (without hitting C). 

The minPPS for this BMDP has two variables a, b and two equations a = min(a^, 6) and b = 1/2. 
This system has clearly only one fixed point: a* = 0,b* = \l‘l. However, there is no policy (whether 
deterministic or randomized) that gives (0,1/2) as the GFP of the resulting PPS, for the same 
reason given above that the BMDP does not have any optimal static strategy. Note in particular, 
that if a policy selects for a the first choice, a^, then the resulting PPS is a = af^b = 1/2, and a 
has value 1 in its GFP, not 0. 

On the other hand, for this BMDP there is a non-static optimal strategy that achieves the 
reachability value 1, namely, do as follows: starting from one A, first use A AA to create two 
H’s. Then apply A ^ B to the “left” A and apply A —>■ AA to the “right” A. Now we have two 
H’s and a B. The B gives us a chance to reach C. On the two H’s, we again take the left H to H 
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and the right A to AA. Repeat. This way, the population will repeatedly contain two ^’s and one 
B forever, and each time B is created it gives us a positive chance to reach C, so we reach C with 
probability 1. 

It turns out, as we will show later, that for any BSSG, if the reachability value is 1, then 
the player maximizing the probability of reachability always has a not necessarily static, optimal 
strategy that achieves this value 1. 

This is not the case if the reachability value is strictly less than 1, as we shall show in the next 
example, Example 13.21 

On the other hand, if the goal was to minimize the probability of reaching C, then starting from A 
there is a simple strategy in this BMDP that achieves this: deterministically choose the rule A —)■ AA 
from all copies of A. This ensures that the process never reaches C, i.e., reaches C with probability 
0. This is clearly an optimal strategy. Indeed, this holds in general: as shown in Corollary 13.31 there 
always exists a deterministic static optimal strategy for minimizing the probability of reaching a 
given type (i.e., maximizing the probability of not reaching it), in a BMDP or BSSG. □ 

Example 3.2 {No optimal strategy at all for maximizing reachability probability in a BMDP). We 
now give an example of a BMDP where the supremum reachability probability of the designated 
type Tf* is < 1, and such that there does not exist any optimal strategy (regardless of the memory 
or randomness used) that achieves the value. 

Consider the following BMDP, where the goal is to maximize the probability of reaching type 
D: 


A 

2/3 

B 

A 


0 

B 

—>• 

A 

B 


C 

C 

1/3 

D 

C 

2/3 

0 


We claim that: 

1. The supremum probability, starting with one A, of eventually reaching an object of type D is 

1 / 2 . 

2. There is no strategy of any kind that achieves probability 1/2. 

Proof. 1. First, to see that the supremum probability starting at A is 1/2, consider the following 
sequence of strategies: strategy r^, for k > 1, chooses B ^ A for all objects in every multiset 
Xi until a multiset is reached in which there are at least k B’s. Then, in the next step, 
chooses B ^ C for all copies of R. In other words, the strategy waits until there are “enough” 
R’s, and then switches to R —>■ C for all R’s. Note firstly that, with probability at least 1/2 
we will eventually have a population of R’s exceeding k, for any k. Thereafter the probability 
of not hitting D will be at most (2/3)^. We can make k as large as we like, and thus we 
can make the probability of not hitting D, conditioned on reaching population k, as small as 
possible. So we can make the probability of hitting D as close as we like to 1/2. 


17 


This can be seen also from the corresponding minPPS using Theorem 13.11 The minPPS has 
three variables a, b, c and equations a = + |, 6 = min(a, c), c = |. It is easy to see that 

the system has only one fixed point, a* = b* = ^,c* = which is thus the GFP. Hence, 
by Theorem 13.11 the reachability value of the BMDP is 1 — a* = 1/2. However, there is no 
policy of the minPPS (and correspondingly, no static strategy of the BMDP) that achieves 
this value. In particular, note that the policy that selects for b the first choice a, yields a PPS 
{a = + |, 6 = a, c = |} with a GFP in which a has value 1, instead of 1/2. 

2. To see that in fact there is no strategy (whether static or not) of the BMDP that achieves 
probability 1/2 , assume, for contradiction, that there does exist a strategy a that achieves 
probability 1/2. 

Gonsider any occurrence of B in the history Xq,Xi, ... of configurations, such that the rule 
B ^ C is applied with positive probability to that occurrence of B by the strategy a. It is 
without loss of generality to assume that such a B exists, because otherwise the probability 
of reaching D would be 0. 

We claim that the total probability of reaching type D would strictly increase if, instead of 
applying action B ^ C with positive probability p' on that copy of B, the strategy a instead is 
changed to a strategy a' where that positive probability p' on action H —>■ C is shifted entirely 
to the pure action B ^ A, and thereafter, in the next step, if on that resulting A the random 

rule A -A BB happens to get chosen, the strategy a' then (with the shifted probability p') 
immediately applies the rule H > (7 to both resulting copies of B. 

To see why this switch to strategy a' would strictly increase the probability of reaching D, 
note that for any given B by choosing B ^ C deterministically the probability of reaching D 
from that copy of B becomes exactly 1/3. On the other hand, by choosing B ^ A from that 
copy of B and thereafter (with 2/3 probability) choosing H —>■ (7 on the resulting two copies 
of B, the new probability of hitting D is 2/3 • (1 — (2/3)^) = 10/27 > 1/3. The same analysis 
shows that even if the original strategy a only chose B ^ C with positive probability p > 0 
then shifting that probability over to the two-step strategy, first choosing B ^ A, achieves 
strictly greater probability of reaching D. Since this analysis holds for any copy of B that 
occurs in the trajectory Xq,Xi, ... of the process, we see that we can always strictly increase 
the probability of reaching D by indefinitely delaying the application of the rule B ^ C. 
However, note that we can not delay application of the rule B ^ C forever: if we do so then 
the probability of reaching D is actually 0. 

Thus, the supremum probability of reaching D is only achieved in the limit by a sequence 
of strategies, which delay the use of B ^ C longer and longer, but is never attained by any 
single strategy. 

We have already seen that the supremum probability of reaching D is at least 1/2, using the 
sequence of strategies described in part (1.) above. Now, to see why the supremum value is 
indeed 1/2, note that if we do indeed delay forever using H —>■ C, then starting with one B or 
one A the process becomes extinct with probability 1/2 (without ever seeing a D). Thus, if 
we delay using B ^ C for “long enough”, then the process becomes extinct with probability 
1/2 — e without seeing D, for an arbitrarily small positive e > 0. So, the supremum value 
of the reachability probability can be at most 1/2, and thus is equal to 1/2. Moreover, we 
have already argued that this supremum value is not achieved by any strategy, because we 
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can always achieve strictly higher probability of reaching D by delaying the use of B ^ C 
one step further. Thus, 1/2 is the supremum value, but is not achieved by any strategy. 


□ 


4 P-time detection of GFP g* = 1 for max-minPPSs and BSSGs 

In this section we will show that there are (easy) P-time algorithms to compute for a given max- 
minPPS the variables that have value 1 in the GFP, and thus also for deciding, for a given BSSG (or 
BMDP), whether g* = \ (i.e., whether the non-reachability value, starting from a given type Tj is 
1). The algorithm does not require looking at the precise values of the coefficients of the polynomials 
in the max-minPPS (respectively, it does not depend on probabilities labelling the transitions of 
the BSSG): it only depends on the qualitative “structure” of the max-minPPS (the BSSG). As we 
show, it reduces to an AND-OR graph reachability problem. 

Recall that in the AND-OR graph reachability problem, we are given a directed graph G, whose 
nodes are partitioned into a set T of target nodes, a set Vi of OR nodes and a set V 2 of AND nodes. 
The set of nodes that can AND-OR reach T is defined to be the (unique) smallest set S of nodes that 
includes T and which has the property that (i) an OR-node u is in S' iff at least one of its immediate 
successors is in S', and (ii) an AND-node u is in S' iff all its immediate successors are in S'. This set 
can be computed easily by an iterative algorithm that initializes S to T, and then repeatedly adds 
to S any OR-node v that has an immediate successor already in S, and any AND-node all of whose 
immediate successors are already in S', until there are no more changes to S'. As is well-known, the 
algorithm can be implemented in linear time. Equivalently, the AND-OR reachability problem can 
be viewed as a two-person zero-sum reachability game, where the OR-nodes belong to player 1 who 
wants to reach some node in the target set T, and the AND-nodes belong to player 2 who wants to 
avoid this. The set of winning nodes for player 1 is precisely the set S of nodes that can AND-OR 
reach T ; a winning strategy r for player 1 from each OR-node in S is to pick an immediate successor 
that was added earlier to S. The complementary set of nodes is winning for player 2; a winning 
strategy a for player 2 from each AND-node that is not in S is to pick an immediate successor that 
is not in S (there must be one, otherwise the AND-node would have been added to S). 

Proposition 4.1. There is a P-time algorithm that given a max-minPPS (and thus also a maxPPS 
or minPPS), x = P{x), with n variables, and with GFP g* G [0,1]”, and given i G [n], decides 
whether g( = 1, or g* < 1. The same result holds for determining for a given BSSG with non¬ 
reachability objective, whether the value of the game is 1. Moreover, in the case where g* = 1 the 
algorithm computes a deterministic policy (i.e., deterministic static strategy in the BSSG case) a, 
for the max player which forces g* = 1, Likewise, if g( < 1, the algorithm computes a deterministic 
static policy r for the min player which forces g( < 1. 

Proof. For simplicity, we assume w.l.o.g., that the max-min PPS, x = P{x) is in SNF form. Gonsider 
the dependency graph G = (y,E) on the variables V = {xi,... ,Xn} of x = P{x). The edges E 
are defined as follows: {xi,Xj) G E ii and only if Xj appears in one of the monomials with positive 
coefficient that appear on the right hand side of Pi{x). 

Let us call a variable Xi deficient if Pi{x) has form L and the coefficients and constant term in 
Pi{x) sum to strictly less than 1; equivalently, Xi is deficient iff Pi{l) < 1. Let Z C {xi, ... ,Xn} 
denote the set of deficient variables. 
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Let X = V\ Z, denote the remaining set of non-deficient variables. We partition the remaining 
variables X = LUQUM according to the form of the corresponding SNF-form equation Xi = Pi{x). In 
fact, we further partition the variables M as M = MmaxUMmin, according to whether the corresponding 
RHS for that variable has the form max{xj,Xfc} or min{xj,Xfc}. 

We can now view the dependency graph G as a (non-probabilistic) AND-OR game graph, namely 
a 2-player reachability game graph, in which the goal of player 1 is to reach a node in Z, whereas the 
goal of player 2 is to avoid this. The nodes of the game graph belonging to player 1 are L U Q U Mmin 
(these are the OR nodes), the nodes of the game graph belonging to player 2 are M^ax (these 
are the AND nodes), and finally the nodes in Z are the target nodes (from which player 1 wins 
automatically). 

Let S be the set of nodes that can AND-OR reach Z, i.e. the set of nodes from which player 1 
can win, let S be the complementary set of nodes from which player 2 wins, and let r, a be winning 
(deterministic, static) strategies for the two players from their respective winning sets, as described 
before the proposition (the definition of the strategies on their sets of losing nodes is irrelevant). As 
we mentioned earlier, the sets S,S and the strategies T,a can be computed in P-time (in fact, in 
linear time). 

We claim that for every variable Xi, we have g* < 1 if and only if Xi G S. 

For the one direction, we can show that g* < 1, and in fact {gt^-r)i < 1, for all Xi G S, by 
induction on the time that Xi was added to S in the iterative algorithm. For the basis case, Xi ^ Z 
is a deficient node, i.e. T’i(l) < 1, and hence clearly g* < {gXj-)i = Pi{gX^) < P’j(l) < 1. For the 
induction step, if Xi is of type Mmin and r chooses Xj G Pi{x) for Xj, then Xj was added earlier to S, 
thus g* < {gX^r)i — {9*,T)j < 1- The other cases when xi is of type L, Q,Mmax are similar. 

To see the other direction, g* = 1, and in fact {g%^)i = 1, for all Xi G S, note that the dependency 
graph of the minPPS x = Pa,*{x) has no edges from 5 to S': all variables of type L U Q U Mmin of S 
depend only on variables in S (otherwise, they would have been added to S), and for variables of 
type Mmax) policy a selected a successor in S. Furthermore, S does not contain any deficient node, 
thus P’i(l) = 1 for all Xi G S. Therefore, the subsystem of x = Pa^^{x) induced by S has the all-1 
vector as a fixed point, hence {g%^)i = 1 (and thus g* = 1), for all Xi (z S. □ 

We will consider detection of g* =0 for max-minPPSs with GFP g* later in the paper. We 
shall see that for maxPPSs, after detection and removal of variables Xj such that g* = 1, so that 
g* < 1, the GFP g* of the residual maxPPS is equal to the LFP q* of the residual maxPPS, and 
thus detecting whether g* = q* = 0 can be done in P-time via simple AND-OR graph analysis using 
the algorithm given in HU. 

For minPPSs, however, the above reduction does not hold, and in fact the P-time algorithm 
for detecting whether g* = 0 is substantially more complicated (but still does not involve knowing 
the actual coefficients of the polynomials in the minPPS, or the probabilities labeling rules of the 
BMDP, only its structure). We provide such a P-time algorithm for deciding whether g* = 0, not 
only for minPPSs, but also for the more general max-minPPSs, in Section [9l 

5 Reachability for BPs, and linear degeneracy 

In this section we study the reachability problem for purely stochastic BPs. Along the way, we 
establish several Lemmas which will be crucial for our analysis of BMDPs. We start by defining the 
notion of linear degeneracy. 
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A PPS X = P{x) is called linear degenerate if every polynomial P{x) is linear, with no constant 
term, and all coefficients sum to 1. Thus x = P{x) is linear degenerate if Pi{x) = where 

Pij € [0,1] for all i € [n], and 'YhjPij = 1- We refer to a linear degenerate PPS as an LD-PPS. 

Note that for any LD-PPS, x = P{x), we have P{0) = 0 and P’(l) = 1, so the LFP is q* = 0 and 
the GFP is O'* = 1. The Jacobian B(x) of an LD-PPS is a constant stochastic matrix B (independent 
of x), where every row of B is non-negative and sums to 1. During the evolution of the associated 
BP, the size of the population remains constant. Thus, if we start with a single object, the MT-BP 
trajectory Xq, Xi, ... is simply the trajectory of a finite-state Markov chain whose states correspond 
to types, and where the singleton set Xi corresponds to the one object in the population at time i. 
Note that the Jacobian B(x) = B is the transition matrix of the corresponding finite-state Markov 
chain. Furthermore, observe that for any LD-PPS we have P{x) = Bx. 

Given a PPS, we can construct its dependency graph and decompose it into strongly connected 
components (SCGs). A bottom SGC is an SGC that has no outgoing edges. The following Lemma 
is immediate: 

Lemma 5.1. For any PPS, x = P{x), exactly one of the following two cases holds: 

(i) X = P{x) contains a linear degenerate bottom strongly-connected component (BSCC), S, i.e., 
xg = Ps{xs) is a LD-PPS, and Psixg) = Bgxg, for a stochastic matrix Bg- 

(a) every variable Xi either is, or depends (directly or indirectly) on, a variable Xj where Pj{x) 
has one of the following properties: 

1. Pj{x) has a term of degree 2 or more, 

2. Pj{x) has a non-zero constant term i.e. Pj(0) > 0 or 

3. Pj{l) < 1. 

A PPS x = P{x) is called a linear-degenerate-free PPS (LDF-PPS) if it satisfies condition (ii) 
of Lemma 15.11 

Lemma 5.2. If a PPS, x = P{x), has either GFP g* < 1 , or LFP q* > 0, then x = P{x) is a 
LDF-PPS. 

Proof. Suppose that for a PPS, x = P{x) condition (i) of Lemma [5.11 holds. i.e., there is a bottom 
SGC S with Pg{xg) = Bgxg for a stochastic matrix Bg. Then Pg{0) = 0 and Pg{l) = 1. So 
gg = l and q^ = 0, which contradicts the assumptions. So, condition (ii) must hold, i.e. x = P{x) 
is a LDF-PPS. □ 

We use p{A) to denote the spectral radius of a matrix A. A basic property that we use is that, 
if A is a non-negative matrix and p{A) < 1, then the matrix / — A is nonsingular, and its inverse 
(/ — A)~^ = non-negative (see e.g. [T7]f. 

We will often use also the following lemma from |11) (stated there more generally for monotone 
polynomial systems). 

Lemma 5.3. i^/TTj/. Lemma 3.3.) Let x = P{x) be a PPS, with n variables, in SNF form, and let 
a,b€R^. Then: P{a) - P{b) = 5(^4^)(a - 6). 

The following is a strengthened variant of Lemma 2.12 from |12| . 
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Lemma 5.4 (cf. Lemma 2.12 of [E]). For any (w.l.o.g., quadratic) LDF-PPS, x = P{x) with LFP 
q*, and for 0 <y< ^(1+ (?*), we have p{B{y)) < 1 and so {I — B{y))~^ exists and is non-negative, 
and thus M{y) is well-defined. 

Proof. The spectral radius p{A) of a square non-negative matrix, A, is equal to the maximum of 
the spectral radii of its principal irreducible submatrices (see, e.g., m, Chapter 8 ). Any principal 
irreducible submatrix of B(y) is a principal irreducible submatrix of Bs{y) for some SCC S of the 
dependency graph of x = P{x) {Bs{y) itself might not be irreducible, since we do not assume 
y > 0). So to show that p{B{y)) < 1 , it suffices to show that for any SCC S, p{Bs{y)) < 1- 

For a trivial SCC, one where S = {xi} for a single variable xt which does not appear in Pfix), 
Bs{y) is the zero matrix so p{Bs{y)) = 0 < 1 . 

Now we consider SCCs which are non-trivial and contain an equation of form Q, Xi = Pfix). 
Here Pi{x) = xjXk for some j,k must contain at least one term, say w.l.o.g., Xj which is also in S 
or we would have the above trivial case. We have Bs{y) < Bs{^{l -pq*)) by monotonicity of B{x). 
But {Bs{y))i,j =yk< ^(1 + ql) = {Bs{^{l + q*)))i,j- So the inequality Bs{y) < Bs{^{l + q*)) is 
strict in the i,j entry. Since the matrix Bs{^{l + q*)) is irreducible, p{Bs{y)) < p{Bs{^{l -|- q*))) 
(again, see e.g., |l7]). So it suffices to show that p{Bs{^{l + q*))) < 1- 

There are two cases. Firstly suppose q^ = 1. Then any SCC D that S depends on also has 
qf) = 1. So ^^(^(l + q*)) = ^ 5 ( 1 ) = Bs{q*). But we know ([E], [H]) that p{B{q*)) < 1 so we 
have that p{Bsi^{l + q*))) = p{Bsiq*)) < p{B{q*)) < 1 . 

Secondly suppose that q*g 1. Then q*g < \. Applying Lemma 15.31 with o = 1 and b = q*, we 
have that B{^{1 + <?*))(! — q*) = T’(l) — P{q*) < (1 — q*)- Since H(|(l -|- q*)) is non-negative and 
1 — g* > 0, we have that i? 5 (^(l -|- g*))(l — gj) < (1 — gj). By standard facts of Perron-Frobenius 
theory, since 1 — gj > 0 and i 35 (i(l -|- g*))(l — gj) < (1 — g^), it follows that p{Bs{^{l -|- q*)) < 1 . 
So in either case we have p{Bs{y)) < p{Bs{^{l + q*)) < 1. 

Finally we consider SCCs which contain only equations of form L. Here Bs{y) is irreducible 
since Bs{x) is a constant matrix and so if i depends on j, Bij{y) 7 ^ 0 . Bs{y) is also substochastic 
since all the entries in the i’th row are coefficients in Pi{x) and x = P{x) is a PPS. Since x = P{x), 
is a LDF-PPS, Bs{y) is not stochastic since otherwise S would be a bottom linear degenerate SCC. 
So there is an irreducible stochastic matrix A with Bs{y) < A with strict inequality in some entry. 
This implies p{Bs{y)) < p{A) = 1 . □ 

Lemma 5.5. For any LDF-PPS, x = P{x), and y < 1, if P{y) < y then y > q* and if P{y) > y, 
then y < q*. In particular, if q* < 1, then q* is the only fixed-point q of x = P{x) with g < 1. 

Proof. Since y < 1, ^(y -|- g*) < |(1 -I- g*). By Lemma [531 {I — B{^{y + q*)))~^ exists and is 
non-negative. Lemma (531 yields that P{y) — q* = B{^{y + q*)){y — q*). Re-arranging this gives 
q* — y = {I — B{^{y + g*)))“^(P(y) — y). So when P{y) — y > 0 we also have q* — y >0, and when 
P{y)-y < 0 we also have q* — y < 0. That is if P{y) < y then y > q* and if P{y) > y, then y < q*. 

Suppose g < 1 is a fixed point, i.e. P{q) = q. Then both P{q) > g and P{q) < g, so both q < q* 
and q > q*. Thus q = q*. □ 

We shall need the following fact about BPs later. 

Lemma 5.6. For a BP, if the PPS associated with its extinction probabilities (see m) is an LDF- 
PPS, X = P{x), and if all types have extinction probability q* < 1, then for any population z and 
any initial population, the probability that z occurs infinitely often is 0. Consequently, starting with 
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any initial population, with probability 1 either the process becomes extinct or the population goes to 
infinity. 

Proof. Let G be the dependency graph of the branching process. Suppose first that G is strongly 
connected. We claim then that almost surely (with probability 1) the process either becomes extinct 
or grows without bound (for any initial population). This can be shown easily using the results in 
[16j in the so called positive regular (primitive) moment matrix case. We give a direct proof. Suppose 
first that all types have positive extinction probability, q* > 0. Let Xk denote the population at 
time k, for k > 0. Then for every population z 0, the probability P{Xk = 0|Xo = 2 :) > 0 for some 
large enough k, and for all k' > k. Hence the population z is a transient state of the underlying 
countable-state Markov chain of the BP, that is, the probability that 2 ; occurs infinitely often is 0. 
Since this holds for every 2 ; 7 ^ 0, the process almost surely either becomes extinct or grows without 
bound. 

Suppose now that there are some types i with extinction probability = 0, and let Z be the set 
of all such types. Then every rule of every type in Z includes in the offspring at least one element 
of Z. So the population of objects with type in Z can never go down. Since the process is not linear 
degenerate, at least one type i* of Z has a rule r* with two or more offspring. Since G is strongly 
connected, if we start with an object of any type, with positive probability, the process will generate 
within n steps an object of type i*, apply rule r*, and within another n steps, the (at least) two 
offspring can generate two objects with type in Z. If the process does not go extinct, this happens 
infinitely often almost surely, and since the number of objects with type Z never goes down, this 
implies that the size goes to infinity. Hence, with probability 1, the process either goes extinct or 
grows without bound. Thus, the lemma holds if G is strongly connected. 

Consider now a branching process with a dependency graph G that is not strongly connected. 
Suppose that there is positive probability that a population 2 ; occurs infinitely often. Let i be the 
type of an object in 2 ; and let j be a type reachable from i that is in a bottom strongly connected 
component S. Every time there is an object of type i in the population, there is positive probability 
that it will generate later on an object of type j. Since 2 ; occurs infinitely often, almost surely the 
process will contain also infinitely often objects of type j. Since q^ < 1, the process starting with 
a single object of type j, grows without bound with positive probability. Since objects of type j 
occur infinitely often, the probability that the process stays bounded is 0 . 

□ 

Lemma 5.7. If x = P{x) is a PPS with GFP g* such that 0 < g* < 1 , then g* is the unique fixed 
point solution of x = P{x) in [0,1]”. In other words, g* = q*, where q* is the LFP of x = P{x). 

Proof. Since g* < 1, by Lemma [T2l x = P{x) is a LDF-PPS. Thus, since P{g*) > g*, it follows by 
Lemma [531 that q* = g*. □ 

Proposition 5.8. (cf. also Proposition 5&6, and Lemma 20; and m Given a PPS, x = P{x), 
with GFP g*, and given any integer j > 0, there is an algorithm that computes a rational vector 
V < g* with II 5 * — u||oo < 2“-^, in time polynomial in |P| and j. 

Proof. By Proposition sn it is without loss of generality to assume that g* < 1, because we first 
preprocess x = P{x), and remove the variables Xj such that g* = 1, plugging in 1 in their place 
on RHSs of other equations. So, we assume wlog that PPS x = P{x) satisfies g* < 1. By Lemma 
15.71 X = P{x) has a unique fixed point in [0,1]”, and g* = q*, where q* is the LFP. We can then 
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simply apply the algorithm from m, to approximate the LFP q* = g* oi x = P{x) within j bits of 
precision in time polynomial in |P| and j. □ 


6 Approximating the GFP of a maxPPS in P-time 

In this section, we will show that we can approximate the GFP of a maxPPS and compute an 
e-optimal deterministic policy in polynomial time. We show also that we can determine easily if the 
value is 0. 

We call a policy a for a max/minPPS, x = P{x), linear degenerate free (LDF) if its associated 
PPS X = P„{x) is an LDF-PPS. 

Lemma 6.1. For any maxPPS, x = P{x), if GFP g* < \ then g* is the unique fixed point of 

x = P{x) in [0,1]"'. In other words, g* = q*, where q* is the LFP of x = P{x). 

Proof. Suppose x = P{x) is a maxPPS with GFP g* < 1 . 

We know, by Gorollarv 13.31 that there is a deterministic optimal policy for achieving the GFP 
for x = P{x), i.e., there is a deterministic policy a such that g* = g*, where g* is the GFP of the 

PPS x = Pa{x). (Namely, a just picks, from each type, an action that maximizes the RHS of the 

corresponding equation evaluated at g*.) 

Let a be such an optimal policy. Then 0 < g* = g* < 1 . By Lemma 15.71 this implies 0 < q* = 
g* < 1 . Next, we observe the following easy fact: 

Lemma 6.2. For all z,z' € [0, !]"■, if z < z' then Pa{z) < P{z'). 

Proof. This holds because P{z) < P{z') by monotonicity of P{x), and because each expression 
{P{z))i in P{z) consists of the max operator applied to a set of monotone polynomial terms, which 
include among them the monotone polynomial {Pf^{z))i, and thus Pcj{z) < P{z). □ 

Now we consider “value iteration” starting from the all-0 vector, on both the PPS Pa{x) and 
the maxPPS P{x). Let x^ := := 0. For i > 1, let x* := Pl{^) and let y* := P*(0). Note that 

X® < x®"''^ and y® < y®"*"^, for all i > 0. 

We claim that x® < y® for all i > 0. This holds by induction on i: base case i = 0 is by definition. 
For i > 0, assuming x® < y®, we have x®^^ = P(j(x®) < P’(y®) = y®"*"^, where the middle inequality 
follows by Lemma 16.21 

By Lemma [5.71 and since a is optimal, we know that (limj_).oo x^) = Qa ~ 9a ~ 9*■ have 

that (limj^oo 2/®) = Q*, where q* is the LFP of the maxPPS x = P{x). But then since x® < y® for 
all i, it follows that g* < q*. But since we always have q* < g*, this implies g* = q*■ □ 

Theorem 6.3. Given a maxPPS, x = P{x), with GFP g*, 

1. Given i € [n], there is an algorithm that determines in P-time whether g* = 0, and if g* > 0 
computes a deterministic policy for the max player that achieves this. 

2. Given any integer j > 0, there is an algorithm that computes a rational vector v < g* with 

lls* —'I'lloo < 2“-^, and also computes a deterministic policy a, such that \\g* — y^ll — > ^oth 

in time polynomial in \P\ and j. 
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Proof. 1. First apply Proposition 14.11 to remove variables Xk with = 1, and record the partial 
strategy for max on those types that achieves = 1. The residual maxPPS has q* = g* by 
Lemma l6.ll Thus, in order to decide whether g^ = q* = 0, we only need to apply the P-time 
algorithm from m to decide whether the extinction probability q* > 0. And the AND-OR 
graph algorithm for this from |14| also supplies a deterministic policy to achieve q* > 0, if this 
is the case. 

2. Again, we first apply Proposition 14.11 so that, wlog, we can assume g* < 1. Then by Lemma 
EH g* = q*, so that we only need to approximate the LFP q* of a maxPPS, x = P{x), to within 
j bits of precision, and compute a (2“'^)-optimal deterministic policy, in time polynomial in 
|P| and j. Algorithms that achieve precisely these two things were given in m- 

□ 

7 Approximating the GFP of a minPPS in P-time 

In this section we will show the following. 

Theorem 7.1. Given a minPPS, x = P{x) with g* < 1. If we use Generalized Newton’s method, 
starting at := 0, with rounding parameter h = j + 2 + 4|P|, then after h iterations, we have 
||<7*-xW||oo<2-L 

In order to prove this theorem, we need some structural lemmas about the GFPs of minPPSs, 
and their relationship to policies. There need not exist any policies a with g* = g*, so we need 
policies that can, in some sense, act as “surrogates” for it. 

Recall that a policy a for a max/minPPS, x = P{x), is called linear degenerate free (LDF) if 
its associated PPS x = Pa{x) is an LDF-PPS. When we consider the minPPS, x = P{x), obtained 
from a BMDP for (non)reachability, after eliminating types which cannot reach the target, the LFP 
q* of X = Pa{x) for an LDF policy, cr, has {q'^)i equal to 1 minus the probability that, starting 
with one object of type i, we reach the target or else generate an infinite number of objects that 
can reach the target under policy cr. It turns out that there is an LDF policy a* whose associated 
LFP is the GFP of the minPPS. Furthermore, it turns out that we can get an e-optimal policy by 
following this LDF policy a* with high probability and with low probability following some policy 
that can reach the target from anywhere. 

Lemma 7.2. If a minPPS x = P{x) has g* <1 then: 

1. There is a deterministic LDF policy a with g* < 1 , 

9* < q*! fox o,ny LDF policy t, and 

3. There is a deterministic LDF policy a* whose associated LFP, q**, has g* = q*, @ 

®We remark for the reader’s intuition (although we shall not prove it) that it can be shown that any LDF policy 
a* for a minPPS that satisfies q** — g* < 1 has the property that in the underlying BMDP cr* maximizes the 
probability of the event of either reaching the target type or else growing the population of types that can reach the 
target to infinity. 
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Rather than proving Lemma 17.21 here, we will instead prove later on a result for max-minPPSs 
(Lemma 19.11 of Section [9]) , which directly generalizes Lemma 17.21 

Note that the policy a* described in part (3.) of Lemma 17.21 is not necessarily optimal because 
even though g* = q**, there may be an i with g* = < {g'^*)i = 1. 

We will need also the following lemma from |12) on linearizations of max/minPPS. 

Lemma 7.3. 7/71|/. Lemma 3.5.) Let x = P{x) he any max/minPPS. Suppose that the matrix 
inverse (/ — B(j{y))~^ exists and is non-negative, for some policy a, and some y G M", where is 
the Jacobian of P^. Then 

(i) M(j{y) is defined, and is equal to the unique point a G M"' such that Pa (a) = a. 

(a) For any vector x G M”.' 

If Pa (x) > X, then x < Na{y). 

If Pa (x) < X, then X > Ma{y). 

We will show now that Generalised Newton’s Method (GNM) is well-defined. 


Lemma 7.4. Given a minPPS, x = P{x), with GPP g* < 1, and given y with 0 < y < g*, there 
exists a deterministic LDP policy a with py(J\fa{y)) =Ma{y), the GNM operator I{x) is defined at 
y, and for this policy a, I{y) = Ma{y). 


Proof. We first show that there is an LDP policy a with P^{Ma{y)) = Ma{y). We will follow a 
proof structure similar to Lemma 3.14 from [BI¬ 
AS there, we will be using policy improvement to show existence of a policy with desired prop¬ 
erties (but not as an algorithm to compute such a policy). Lemma 17.21 ([TJ) gives the existence of a 
deterministic LDF policy given our assumption that g* < \. So we start with such an LDF policy 
cJi. Initially f = 1, and we increment i after each policy improvement step. 

In the general step i we have a deterministic LDF policy cTj. By Lemma 17.21 (2.), g* < q*.. 
Since y < g* < 1, we have y < ^(1 -|- g*) < ^(1 -|- Thus, we can apply Lemma 15.41 to the 
LDF PPS X = Pai{x) to conclude that (/ — Ba^{y))~^ exists and thus Mai{y) is well-defined. Let 
z = Maiiy). By Lemma [731 P^ii^) = So P^{z) < z. If py{z) = z, then stop as we have a policy 
a with py[Na{y)) = Ma{y). Otherwise, there is a j with {py{z))j < Zj. Pj{x) has form M since 
{Py{z)), < {Pl{z)),. Thus P^{x) = min{xfc, Xg-.Q)} for some variable x^, and z^ < Za^[j). Dehne 
iTj+i to be 




cji{l) lil^ j 
kiil= j. 


We will hrst show that cTj+i is LDF, which implies (as we argued for CTi) that A/'o-i+i (?/) is well-defined, 
and then we will show that Mai^.,{y) < 2 ; and (y) ^ z. 


Claim 7.5. cii+i is LDF. 


Proof. Suppose for a contradiction that cjj+i is not LDF. Then there is a bottom SGC S' of x = 
Pai+iix), with {Pai+i)sixs) = Bsxs where Bs is a stochastic irreducible matrix. S must include j 
and k since otherwise Uj would not be LDF. Note that since S is a linear degenerate bottom SGC, 
for coordinates j G S we have Pai_^_i (x) = Pai+Ax)- Now we have {Pai+Az))j = {Pl+i{z))j < zj, 
but for every other coordinate I G S such that I A 3i {P(Ti+Az))i = {Pa^J^A^))l — iP^iiz))l = zi. 
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Thus {Pa,+i{^))s = {Bszs) < zs, but with {Pa^^^{z))j = {Bszs)j < zj. However, if we 
let j" G 5* be a coordinate of zs with minimum value, we see that {Pai+iiz))j" is just a convex 
combination of the other coordinates of zs- Thus the coordinates of zs that appear in 
must all also have the minimum value, and thus they are equal to Zj". Repeating this argument, 
since S is strongly connected, this implies that all coordinates of zs are equal to Zjn. But this 
contradicts the strict inequality {Pai+i{z))j < Zj in the j coordinate. Thus, Uj+i must be LDF. □ 

Therefore, Afai+iiu) is well-defined. 

We know ( 2 ;))j < Zj, but for every coordinate I / j, {Pai+i{z))i = Zi- So we have 

Pai+iiz) < z. Lemma 17^ (ii) yields that -Afai+iiu) < - 2 - But -Afai+iiu) 7 ^ because Pa^^-i^^z) ^ z 
whereas by Lemma [7?3] fil from [12], we have Pai+ii-^ai+iiv)) = -^ai+iiy)- 

Thus the algorithm gives us a sequence of deterministic LDF policies fii, 1 T 2 , ..., with Ma^{y) > 
■^< 72 {y) P ^cTziy) > • • •, where each step must decrease at least one coordinate of Ma^{y). It follows 
that Uj ^ (Tj unless i = j- There are only finitely many deterministic policies. So the sequence must 
be finite and the algorithm terminates. But it only terminates when we reach a (deterministic) LDF 
policy ai with py{Na^{y)) =J\fai{y)- 

Recall that I{x) is defined to be the unique optimal solution to the following lpH 

Maximize: '^^ai ; Subject to: P^{o) > a 

i 

We want to establish that I{y) is well dehned, i.e. that the LP has a unique optimal solution, and 
that solution is Afaiiy)- First, Afatiy) is a feasible solution to the LP since P^= Afaiiy)- 
Furthermore, if a is any feasible solution, i.e., if py{a) > a, then Pai{a) > P^{a) > a, so by Lemma 
O (ii), a < Afaiiy)- So Afa^iy) has the maximum value in every coordinate among all feasible 
solutions. Thus, it is the unique optimal solution to the LP, and I{y) =J\f(j^{y). □ 

Now we can show a halving result for GFPs of minPPS, similar to the following lemma that was 
shown in [12] for LFPs of PPS: 

Lemma 7.6. (IWj, Lemma 3.17) If x = P{x) is a PPS and we are given a,b € with 0 < a < 
b < P{b) < 1, and if the following conditions hold: 

A > 0 and b — a < X{1 — b) and (/ — B{a))~^ exists and is non-negative, 

then b — Af{a) < ^{1 — b). 

We show an analogous lemma for GFP of minPPS. 

Lemma 7.7. Let x = P{x) he a minPPS with GFP g* < 1. For any 0 < y < g* and X > 0, we 
have I{y) < g*, and if: 

g*-y<X{l-g*) 

then 

9*-/(!/) < 5 ( 1 -9*) 

®As we explained in Section 2, the constraints P^{a) > a can be written as linear inequalities. 
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Proof. By Lemma [7.41 there is a deterministic LDF policy a with I{y) = Ma{y)- We apply Lemma 
17.61 to the PPS X = Pa^x), with its variable a replaced by our y, and with its variable b replaced 
by g*. Observe that P(j{g*) > P{g*) = g* and that (I — B(j{y))~^ exists and is non-negative. Thus 
the conditions of Lemma 17.61 hold and we can conclude that g* — Ma{y) < f (1 — S'*)- 

All that remains is to show that I{y) = Ma{y) < g*. By Lemma 17.21 1311. there is an LDF 
policy r with g* = q*. By Lemma 15.41 applied to the PPS x = Pt{x), the matrix (/ — Br{y))~^ 
exists and is non-negative, and Mr{y) is well-defined. Any solution a to the LP defining I{y) has 
Pf{o) > P^{o) > a, so a < Afr{y) by Lemma iLBl So I{y) < A/’r(y). But we know from Lemma 
3.4 of im, that for any PPS with LFP q* < 1, ii y < q*, and Af{y) is defined, then Af{y) < q*. 
Applying this lemma to the PPS x = Pr{x), and since q* = g* < 1 and y < g*, we conclude that 
J^riy) < q*r- Therefore, I{y) < Nr{y) <q*=g*. □ 

Proof of Theorem 17.71 The theorem now follows by directly applying exactly the same inductive 
argument as given in m for the proof of Theorem 3.21 there for the LFP. Specifically, we start 
GNM at x^^^ := 0 , and we let x^^^ denote the /c’th iterate of GNM (applied on the minPPS, 
X = P{x), which has g* < 1 ), with rounding parameter h := j + 2 + 4|P|. In all iterations we have 
0 < < g* . Let (1 — s*)!!!!!! = minj(l — g*)j. As in |12) . we claim, by induction on k, that for 

all k > 0: 


k-l 


_xW < ( 2 -^ + ^2 




i=0 


(1 - S'*)! 


■( 1 - 5 *) 


For the base case, k = 0, we have 


- = s* < 1 < 


(l-5*)r 


( 1 - 5 *). 


For the induction step, let us write for simplicity the claimed inequality as g* — x^^^ < Afc(l—s*). 
The induction hypothesis g* — < Afc_i(l — g*) implies by Lemma 17.71 that g* — < 

(1 — g*). The A:th iterate x^^'^ satisfies > I{x^^~^'^)i — 2~^ in every coordinate i. Therefore, 
g* _ a;(D < ^(1 _ + 2-^1 < (V + (T^^)(l - 9*) = Afc(l - g*)- 

This shows the claimed inequality. Since < 2~^~^^, the inequality implies that 

g* - xW < (2-^ + 2-^+^) ,, for all k. 

i-L Q /min _ _ 

Let a* be the (deterministic) LDF policy of Lemma [7.21 (3.) with q% = g* ■ It was shown in |11) 
(Lemma 3.12), that if the LFP of a PPS x = P{x) is < 1 then the difference from 1 is at least 2~^^P 
in every coordinate. Applying this lemma to the PPS x = Po-*(x) and noting that \Pa* \ < jT*!, we 
have that < ^_||p| = 2^I^L Therefore, ||^* — x^^^Hoo < (2“^ + 2“^+^)2^I^L If we then let 

k = h = j + A\P\ + 2, we get that H^f* — x^^^Hoo < 2“L □ 


8 Computing e-optimal policies for the GFP of minPPSs in P-time. 

In this section we show how to construct an e-optimal randomized policy for the GFP of a minPPS, 
X = P(x), in time polynomial in the input encoding size |P| and log(l/e); note that there may 
not exist any deterministic e-optimal policy (recall Example 3.2). We also consider BMDPs with 
the minimum non-reachability (i.e., maximum reachability) objective and show how to construct a 
deterministic non-static e-optimal strategy, again in time polynomial in |P| and log(l/e). 
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Given a minPPS, x = P{x), with n variables, we first preprocess it to identify and remove all 
variables with value 1 in the GFP; the policy can be set arbitrarily for all these nodes of type M that 
have value 1. So assume henceforth that g* < 1. We first show how to find a deterministic LDF 
policy a with ||gf* —QaWoo < We will then use this policy to construct an e-optimal (randomized) 
policy. Both steps are conducted in time polynomial in |P| and log(l/e). 

We use the following algorithm to construct a deterministic LDF policy a with ||g'* — g*||oo < 
Note that each step of the algorithm runs in time polynomial in |P| and log(l/e). 

Algorithm minPPS-e-policyl 

1. Gompute, using GNM, & 0 < y < g* with ||g(* — y||oo < 

2 . Let k := 0 , and let (Tq be a policy that has Pa^iy) = P{y), i.e., do chooses for each type M 
variable Xi a variable Xj of Pi{x) that has the minimum value in the vector y. 

3. Gompute Fa,^, the set of variables that, in the dependency graph of x = Paf,{x), either are or 
can reach a variable Xi which either has form Q or else Pi{l) < 1 or Pi{0) > 0. Let be 
the complement of Fa,. ■ 

4. If Da^. 7 ^ 0, find a variabl^ Xi of type M in Da,, that has a choice Xj in Fa^^ which isn’t its 

current choice, such that \yi — yj\ < Let ak+i be the policy which chooses Xj at 

Xi, and otherwise agrees with a^- Let k := k + 1, and return to stepO 

5. Else, i.e., if Dq-j. is empty, output dfc and terminate. 

We will show that the final policy a computed by this algorithm has the desirable property. To 
start, we will extend the following lemma from jl2| to GFPs of minPPS. 

Lemma 8.1 (Lemma 4.4 from |12|). If x = P{x) is a max/minPPS, and if 0 < y < q*, then 

\\P{y) - y\\oc < 2||g* - 2 /||oo- 

Lemma 8.2. If x = P{x) is a minPPS, and ifO<y<g*<l, then \\P{y) — y\\oo < ‘I\\g* — y\\oo- 

Proof. Let a* be the (deterministic) LDF policy of Lemma l7.2l| 3ll that has q*, = g*. We apply 
lemma [8T] to the PPS x = Pa*{x). This yields \\Pa*{y) — y\\oo < 2 ||( 7 * — y\\oo- 

So for any Xi not of form M, we have \Pi{y) — yi\ = \{Pa* (y) — y)i\ < ‘^Wd* — 2 /||oo- For Xi of form 
M, we have Pi{x) = min{xj,Xfc} for some variables Xj, x^- Suppose wlog that yj < y^ and thus 
Pi{y) = yj- Then we have Pi(y) = yj >g*-\\g*-y\\oo > 9* - \\g* -y\\oo- Since P{y) < P{g*) = g*, 
Pi{y) < g*i- For yi, we also have g* - \\g* - y\\oo < Vi < g* ■ Therefore, \Pi{y) - yi\ < \\g* - y||oo. □ 

We use this lemma to bound \\Pa{y) — y\\oo for the policy a output by the algorithm. 

Lemma 8.3. Algorithm minPPS-e-policyl always terminates in at most n iterations of steps (3.)- 
(4-), and outputs a deterministic LDF policy a with ||P(T(y) — y||oo < Since each iteration 

runs in time polynomial in \P\ and log(l/e), so does the entire algorithm. 

^ We will show that such a variable Xi always exists whenever we reach this step. 
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Proof. We first note that if the algorithm terminates, then it outputs an LDF policy since every 
variable in Fq-j, satisfies condition (ii) of Lemma 15.11 applied to the PPS x = Po-j, • We need to show 
that the algorithm terminates in the specified number of iterations, and that the final policy satishes 
the claimed bound. 

At step 1 of the algorithm, we have \\g* — y||oo < Thus, by Lemma 18.21 we have 

\\P{y)~y\\ < It follows by the choice of (Tq that \\PcrQ{y) — y\\ < Whenever we 

switch Xi of form M from xi to Xj at an iteration k, we have \{Pak+iiy) ~y)i\ — \yj ~yi\ ^ 
since we required that \yi — yj\ < So for all k, \\Pak{y) “y||oo < Thus, if the 

algorithm terminates, it outputs an LDF policy a with \\Pij(y) — y||oo < 

Next we show that if Du^, is non-empty in some iteration k, then it contains an x* of form M 
which has a choice xj in with \yi — yj\ < Consider any xi in Fo-j,. Let cr* be a 

(deterministic) LDF policy such that g* = q** (which exists by Lemma l7.2t|3l )). cr* is an LDF 
policy so there is a path in the dependency graph of x = Pa* (x) from xi to some Xm which is not 
of form M and is either of form Q or has Fm(l) < 1 or Fm(0) > 0. Thus Xm is in Fg-^. So there must 
be a variable x* on the path from x; € Da^ to Xm £ Fat ) with Xj G Da ,., which depends directly on 
an Xj which is next in the path and such that Xj G Fg-j,. So {Pa*{x))i contains a term with Xj and 
{Pa^{x))i does not. Thus Xj is of form M and {Paj^{x))i = Xj. But applying Lemma [8.II to the PPS 
X = Pa*{x) gave us that \\Pa*{y) -y\\oo < ^9* -y||oo- So \yi-yj\ < 2\\g* -r/||oo < We 

can thus switch x* to Xj in stepO 

Since no variable in Fq-^, depends on a variable in Fo-fc, we have that Fa^_^_.^ D Fa^. U {xj}. Since 
there are only n variables, this means that for some k < n, all are in Fa^ and the algorithm 
terminates in at most n iterations of the steps (3.) and (4.). □ 


We now show that the policy a has the desired property. 


Lemma 8.4. The output policy a of Algorithm minPPS-e-policyl satisfies q* < 1 and \\g* — g* ||oo < 
Ic. 

Proof. We will show the lemma in two steps. In Step 1, we will show that q* < 1. In Step 2 we 
will use this to show that \\g* — q*\\oo < 

Step 1: ql < 1. 

This section of the proof is essentially identical to part of the proof of Theorem 4.7 in |12| . 
Suppose, for contradiction, that for some i, {q*)i = 1. Then by results in |15) . x = Pa{x) has a 
bottom strongly connected component S with q^ = 1. If Xj is in S then only variables in S appear 
in {Pa)i{x), so we write xs = Psix) for the PPS which is formed by such equations. We also have 
that Bsil) is irreducible and that the least fixed point solution of xs = Psixs) is q^ = 1. Take ys 
to be the subvector of y with coordinates in S. 

We will apply Theorem 4.6 (ii) from |12j . which states that if a PPS x = F(x) is strongly 
connected, has LFP q* = 1, and a vector y satisfies 0 < y < 1 = q*, then (/ — F(y))“^ exists, 
is nonnegative, and ||(/ — F(?/))“^||co < 2^l^l/(l — y)min- Applying this theorem to the PPS 
Xs = Ps{xs) with \{ys + 1) in place of y, gives that 


||(/-Fs(i(ys + l)))-'| 


CX 3 ^ 1 


2^1-Ps I 


2(1 -y 5 )r 

But \Ps\ < \P\ and (1 - ys)mm > (1 - g*)min > 2“^l^l. Thus 


+ !)))-■ IU<2*l''l+> 
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From Lemma [531 Psivs) - Psi^) = Psivs) - 1 = ^5(^(1 + ys)){ys - !)• Hence, (/ - + 

y 5 )))(l - 2 / 5 ) = 1 - ys + Psivs) - 1 = Psivs) - ys, and therefore: 


^-ys = il - Bsi^i^ + ys))) ^iPsivs) - ys) 

Taking norms and re-arranging gives: 


\\Psiys) - y5)lloo > 


l|i - ys\ 


2-4|P| 


However ||P 5 (ys) - ys)||oo < \\Paiy) - y\\oo and ||Hc.(y) - y||cx> < 2 by Lemma[831 This is 

a contradiction and so o'* < 1 . 

Step 2: \\g* - g*||oo < |e. 

Now that we have q* < 1, we can apply the following generalisation of Theorem 4.6 (i) of m- 

Lemma 8.5 (cf Theorem 4.6 (i) of [E]). If x = P(x) is an LDF BPS with q* < 1 and 0 < y < 1, 
then Bi^^iy + q*)))~^ exists, is nonnegative, and 


||(/ - B{i(y + «*)))-‘lloo < {2(1 - 21''!} 


Proof. The only difference between Lemma 18.51 and the corresponding Theorem 4.6 (i) of |12) . is 
that instead of assuming that 0 < g* < 1 there, here we assume that q* < \ and that x = T’(x) 
is an LDF PPS. Furthermore, the only part of the proof of Theorem 4.6 (i) which employs the 
assumption that q* > 0, is Lemma C .8 of |12j . for which we now establish the analogous Lemma 
18.61 below, under the alternative assumption that x = P(x) is an LDF PPS. 

Lemma 8.6. For any LDF-PPS, x = P(x), with LFP q* < 1, for any variable Xi either 
(I) the equation Xi = Piix) is of form Q, or else Pi(l) < 1, or 

(II) Xi depends (directly or indirectly) on a variable Xj, such that Xj = Pjix) is of form Q, or else 
Pjil) < 1. 

Proof. Consider the set S of Xi which do not satisfy either (I) or (H); i.e., S is the set of variables 
that cannot reach in the dependency graph any node Xj that has type Q or is deficient (P’j(l) < 1 ). 
Suppose for a contradiction that S is non-empty. No element Xi in S can depend on an element 
outside of S since otherwise by transitivity of dependence it would satisfy (H). Consider the LDF- 
PPS xs = Psixs)- Since this has no variables of form Q, Psixs) is affine i.e. we have Psixs) = 
Bsi0)xs + Hs'(O). So for any fixed point qg of xs = Psixs)^ we have qs = Bsi0)qs -|- Psi^)- 
Since x = P(x) is LDF, Lemma 15.41 yields that (/ — exists and is non-negative. So we 

get qs = il — Bsi0))~^PsiO), i.e. the linear system is non-singular, it has a unique solution, so 
X = Psixs) has a unique fixed point. But because (I) does not hold for any variable in S, we have 
^ 5 ( 1 ) = 1. So the unique fixed-point is ( 7 ^ = 1. This contradicts the assumption that ( 7 * < 1 and 
so S is empty. □ 
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The rest of the proof of Lemma [8.51 is word-for-word identical to the rest of the proof of Theorem 
4.6 (i) from [12| (using Lemma [8.61 instead of Lemma C.8 there), so we will not repeat it here II □ 

Corollary 8.7. If x = P{x) is an LDF PPS with 0 < ( 7 * < 1, then 

Proof. We substitute y := q* in Lemma 18 .5 1 along with the bound (1 — (7*)min > from Theorem 

3.12 of m]. □ 

We can now complete Step 2 of the proof of Lemma 18.41 By Lemma 15.31 Bfj{^{q*+y)){qf — y) = 
q* - Pa{y)- Rearranging this gives q*-y= (I - Bai^iq* y)))~^{Paiy) - y). Taking norms, using 
the fact that y < g* < qf (since a is LDF), and applying Corollary 18.71 on the PPS x = Paix) and 
Lemma 18.31 we have: 

\\ql-y\\oo < Wil - B„{^{ql + y)))-^\\oo\\Pa{y)-yWoc, 

< ||(/-R.(g:))-'||oo||P.(y)-y||oo 



By Lemma [TJHSI), we have g* < q*. We have y < g* < q* and so ||g*-5r*||oo < \\q*-y\\oo < □ 

We define a randomized policy v for the minPPS as follows. Let a be the policy computed by 
Algorithm minPPS-e-policyl and let r be a (LDF) deterministic policy that satisfies g* < 1 (which 
can be computed in P-time by Proposition 14.ip . For each type M variable, the policy v follows with 
probability the choice of policy r, and with the remaining probability 1 — the 

choice of policy a. 

Theorem 8.8. The policy v satisfies IIS'* — gy\\oo < be., it is e-optinial. 

Proof. We will show that q* is close to q* and g* , and that g^ = q^. First note that Py{g*) > g*: 
for variables Xi of the minPPS that have type L or Q, {Py{g*))i = Pi{g*) = g* , and for variables 
Xi of type M, e.g. Xi = m.m{xj,Xk), we have g* = mm{g*,gl), and thus {Py{g*))i > g*. Since 
Pv{g*) P g*, we have q* > g* by Lemma 15.51 We seek a 2 ; close to g* such that g* < q^ < z. 

Lemma 8.9. For an LDF-PPS x = P{x) with LFP q* < 1, let z = q* + 5{I — B{q*)) where 
0<6< 2-28I-PI-3. Then P{z) <z- ^1. 

Proof. From Lemma [5.31 

B{^{q* + z))iz-q*)=P{z)-q* (1) 

From the dehnition of 2 ; we have {I — B{q*))(z — q*) = 5\ and so 

B[q*){z-q*)=z-q* -51 (2) 

®This is the part that starts after the proof of Lemma C.8 on page 37 of m and finishes with the desired norm 
bound inequality at the top of page 39 that completes the proof of part (i) of Theorem 4.6 there. 
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Subtracting ([2]) from ([T]), we obtain 

+ z)) - B{q*)){z - q*) = P{z) - z + 61 

If Pi{x) is of form L, the ith row of B{x) does not depend on x so we have Pi{z) — Zi + 6 = 0 as 
required. 

If Pi{x) is of form Q, wlog Pi{x) = XjXk then we have {{B{^{q* + z)) — B{q*)){z — q*))i = 
+ = {Zj-q]){zk-ql)- Thus we have Pi(z)-2;* +5 < \\z-q*\\l^. 

But here \\z — < 6‘^\\{I — < ^ 2228 |P |+2 ^ i Pi{z) < Zi — ^<5. □ 

We apply this Lemma on the PPS x = Pa{x) with 6 = We get that for z = 

q* + 2-28|-P|-4e(/ - B^{q*))-^1, P^{z) < z - 2-28TI-3e. For any x G [0,1]**, P„{x) € [0,1]^ 

and Pt{x) G [0,1]*^, so ||Po-(x) — Pr(a;)||oo < 1- So, by definition of u, ||P(j(x) — Pv{x)\\oo = 
2-28T|-3g||p^(j;) _ P.j_(x)||oo < 2“28|^’|-3g^ particular \\Pa{z) — Pv{z)\\oo < 2“28|-P|-3g^ gQ 
we have P^ (z) < Pcr{z) + 2 28|^’I 3g ^ gQ i^y Lemma [531 q* < z. Now we have g* < q* < z, and 
so using Lemma 18.41 and Corollary 18.71 we get: 

11^ ^11 ^ II 11 

II 9 l|oo ^ ll^: 9 Iloo 

< 11^ ^11 ill ^11 

ll^tT 9 I|ooT||Z ^(7 Iloo 
< le + 2-28|C-3,||(j_p^(^*))-l||^ 

< ie + 2-28lC-3g2i4|P|+i 

< e 

Recall that a PPS x = P{x) has 5* < 1 if and only if either Pi{l) < 1 or there is a path in the 
dependency graph from Xi to an Xj with Pj{l) < 1- If there is a path from Xi to Xj in the dependency 
graph of X = Pt (x) , then the same path exists in x = (x). Then by the same graph analysis that 

gave us g* < 1, we have g* < 1. And so by Lemma [521 qt — 9v- we have H^* — (7*|| < e. That 
is, V is an e-optimal policy. □ 

So, in a BMDP with minimum non-reachability (i.e., maximum reachability) objective, we can 
construct efficiently, in time polynomial in the encoding size of the BMDP and log(l/e), a ran¬ 
domized static e-optimal strategy. The following theorem shows that we can also construct a 
deterministic non-static strategy. 

Theorem 8.10. For a BMDP with minPPS x = P(x), and minimum non-reachability probabilities 
given by the GPP g* < 1, the following deterministic non-static strategy a is also e-optimal starting 
with one object of any type: 

Use policy a that is the output of Algorithm minPPS-e-policyl, until the population has 
size at least ^ ^ for the first time; thereafter use a deterministic static policy r such 
that g* < 1. 

Proof. It follows from Lemma 15.61 that if we start the BP with an initial population of a single 
object with type corresponding to x*, 1 — {q^)i is the probability that we either reach the target or 
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else the population tends to infinity as time tends to infinity. So under the strategy a, with at least 
probability 1 — ((7^)0 either reach a population of more than ^ ^ or we reach the target. 

Let p be the probability that we reach the population under a without reaching the 

target. Then 1 — {q%)i — p is the probability that we reach the target while staying under 
population. 

We claim that the probability of reaching the target from any population of size m > using 

r is at least 1 — ^e. For a single object of type corresponding to Xj, this probability is 1 — {g*)j > 
2-4|P| gince we can consider descendants of each member of the population independently, the 
probability that any of them reach the target is at least 1 — (1 — 2“'^!^!)”^ > 1 — 

The probability of reaching the target using a is then at least 1 — {q’^)i —p + p(l — ^e) > 
(1 — g* — ^e) + p^e > 1 — g* — e. So a is e-optimal. □ 


Corollary 8.11. Given a BMDP with a minimum non-reachability (i.e. maximum reachability) 
objective, and any e > 0, we can compute a static randomized e-optimal strategy or a deterministic 
non-static e-optimal strategy in time polynomial in both the encoding size of the BMDP and in 
log(l/e). 


9 P-time detection of GFP g* = 0 for max-minPPSs and reachabil¬ 
ity value 1 for BSSGs. 

In this section we give a P-time algorithm for deciding whether the value of a BSSG reachability 
game is equal to 1 (i.e., whether g) = 0 for a given max-minPPS), in which case we show that the 
value is actually achieved by a specific, memoryful but deterministic, strategy for the maximizing 
player, which we can compute in P-time. Thus there is no distinction between limit-sure vs. almost- 
sure reachability for BSSG. Recall however that, as shown by Example 13.11 for a BSSG (or even 
BMDP) with reachability value equal to 1 there need not exist a static (even randomized) strategy 
that achieves almost-sure reachability. 

Before presenting the algorithm, we need to extend the concept of LDF policies to max-minPPSs 
and prove a basic lemma about them. We define a policy r for the min player to be LDF if for all 
policies a of the max player, x = P^^rix) is an LDF PPS. The following Lemma directly generalizes 
Lemma [7.21 to max-minPPSs, and indeed its proof also provides the missing proof of Lemma [7.21 for 
minPPSs. 

Lemma 9.1. If a max-minPPS x = P{x) has g* <\ then: 

1. There is a deterministic LDF policy r for the min player with gl^ < 1, 

-2. g* < q* for any LDF policy t' for the min player, and 

3. There is a deterministic LDF policy t* for the min player whose associated LFP, q* , has 
9* = qI,t* ■ 

Proof. 

1. Recall the P-time algorithm to detect whether g* = 1 (see Proposition 14.11 and its proof). 
That algorithm yields a deterministic policy r with g^ .^ < 1. For all max player policies a, 
we have g*.,.<l. Lemma (5.21 gives that all such PPSs x = P^^rix) are LDF. Thus, r is LDF. 
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2. To prove part ([21), let t' be any LDF policy for the min player. Note that g* = P{g*) < 

So there exists a a with g* < P^y{g*) = Payid*)- Namely, a simply chooses, for 
each equation Xi = max{a:j,Xfc}, the neighbor Xj if g* = max{gj,g’^} = g*-, and otherwise 
chooses Xfc, since then g* = m.ax{g*,gl} = g^. By applying Lemma [5.51 to the LDF-PPS 
X = Pa,T'{x), with y := g*, we get g* < q*y < q*y. 

3. We will first show there exists a deterministic LDF policy r* such that P{ql ^*) = and 
we will then argue that g* = ql^-*. This proof is somewhat similar to the proof of Lemma 3.14 
from |12j . as well as the proof of Lemma f7.41 in this paper. The proof uses policy improvement 
to demonstrate the existence of the claimed policy (but not as an algorithm to compute it). 

Part m) of this Lemma yields that there is a deterministic LDF policy r with gl^ < 1. Thus 
we have < 1. 

At step 1, we start policy improvement with ri := r. At step i, we have a deterministic 
LDF policy r* with ql^. < 1. If P{ql^.) = , stop (because then, as we will see, policy 

Ti satisfies g* = qX^i)- Otherwise, there must be an Xj with Pj{ql^.) < {q’X^.)j, because 
-P(<?*Ti) — P*,n{9.*Ti) ~ 9*Ti- Note that Xj belongs to min, because otherwise we would have 
Pjiqt,Ti) — So we must have Pj{x) = min{xfc, for some Xk- Then set Tj+i 

to be the policy that selects x^ at Xj for some Xj with Pj(ql^.) < {ql^-Jj, but is otherwise 
identical. We need first to show that Tj+i is LDF. 

Claim 9.2. The policy Tj+i is LDF. 

Proof. Suppose for a contradiction that Tj+i is not LDF. Then there exists a policy a for max 
such that a bottom SCO S oi x = Pa,Ti+i{x) is linear degenerate. This SCO must contain 
Xj and Xk since otherwise S would also be a linear degenerate SCC of x = P^^nix) and so Tj 
would also not be LDF. 

By construction, (g*,ri) < <ll,n inequality (T’^.n+i(?*,ri))i < 

coordinate j E S. 

Let j" = argminj/£5 (g*)j/ be any coordinate of the vector (g* .j.,)s which has minimum value. 
We have iP*,ri+Aql,ri))j" < 

We claim that any Xj/ E S that appears in {P*,Ti+i{x))j» must also have this minimum value, 
i-e- = iqt,Ti)j'' ■ If has form L, then {P*,Ti+i{qX,ri))j" is just a convex 

combination of coordinates of (g^T-Js'- H any of these are bigger than their minimum value 
then we would have {P*,Ti+i{qt,Ti))j" > {ql,Ti)j" which is a contradiction. If {P*^ri+i{x))j" 
belongs to min, then it is equal to Xj/ E S. Again we have (g^^-Ji' — {q*Ti)j" which is an 
equality by minimality of j". If {P*,Ti+x{x))jii belongs to max, then we must have (g^T-Ji' — 
{P*,Ti+^{x))jii < iqtTi)j" which again is an equality by minimality of j". Lastly {P*^Ti+iix))j" 
can not have form Q since S is linear degenerate in x = Pa,Ti+iix)- This completes the proof 
of the claim that such xj/ are also minimal. 

Since S is strongly-connected in x = Pa,Ti+i (x), xj and Xk depend (directly or indirectly) on Xjn 
in X = Pa,Ti+x (x) and so in x = P*,Ti+i (x) as well. By induction, we have that (g* = (g* 

and (qX^rjk = (9*,ri)i"- But now we have (g*,^,)j = (qX,Ti)k- This contradicts (g*_^Jfc < (qX,Ti)j 
which is why we switched xj to Xfc in Tj+i. Thus Tj+i is LDF. □ 
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1. Initialize S' := { Xj € X | -Pi(O) > 0, i.e., Pi{x) contains a constant term }. 

2. Repeat the following until neither are applicable: 

(a) If a variable Xi is of form L or Mmax and Pi{x) contains a variable that is already in S', 
add Xi to S'. 

(b) If a variable Xi is of form Q or Mmin and both variables in Pi{x) are already in S, add Xi 
to S. 

3. Let F := { Xj € X — S' I Pi{l) < 1, or Pi{x) has form Q }. 

4. repeat the following until no more variables can be added: 

(a) If a variable Xj G X — S' is of form L or Mmin and Pi{x) contains a term whose variable is 
in F, add Xi to F. 

(b) If a variable Xj G X — S' is of form Mmax and both variables in Pi{x) are in F, add Xj to 
F. 

5. If X = S U X, terminate and output F. 

6 . Otherwise set S := X — F and return to step[2j 


Figure 1: P-time algorithm for computing {xj G X | 5* =0} for a max-minPPS with GFP g* < 1. 

By construction of Tj+i, we have P*,Ti+iiQt n) — with strict inequality in the j coordinate. 

There is a policy a for max that has = ^CT,ri+i such a a, we have Pa,Ti+i{Qt^Ti) — Qt,Ti 

with strict inequality in the j coordinate. By Lemma 15.51 applied to the LDF-PPS, x = 
Pa,n+^{x) with y := g*,.., this implies So 

This cannot be an equality since P*,Ti+i{(lt^Ti) 7^ 9*,Tr So the algorithm cannot revisit the same 
policy, i.e., for all k ^ i, we have 7^ Tj. Since there are only finitely many deterministic 
policies, the algorithm must terminate. 

So the algorithm terminates with a deterministic LDF policy r* with P{ql^t) = ql^*. All 
that remains is to show that g* = ql^*. P{ql^*) = ql^-*, so is a fixed point of x = P{x) 
and the GFP g* satisfies g* > ql^*- By part ([2j) of this Lemma, g* < qXr*- Therefore, 

:f: :k 

9 =q*,T*- 

□ 

We are now ready to give the algorithm. First, we identify and remove all variables Xj with 
g* = 1 (which we can do in P-time, by Proposition 14.11) . Let X be the set of all variables in the 
remaining max-minPPS x = P{x) in SNF form, with GFP g* < 1. The algorithm is described in 
Figure [H and Theorem 19.31 shows that it computes the set {x^ G X | = 0}. 

Theorem 9.3. The procedure in FigureUl applied to a max-minPPS x = P{x) with g* < 1, always 
terminates and outputs precisely the set of variables {xj G X | = 0}, in time polynomial in 

|P|. Furthermore we can compute in P-time a deterministic policy a for the max player such that 
{9a*)i > 0 variables xi in {xj G X | gr* > 0}. 
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Proof. Firstly we show that all variables Xi in the output F have g* =0. To do this we construct 
an LDF policy r* for the min player such that = 0, and then argue that g* = 0. 

By Lemma iQ.ll fTli. there is an LDF policy r with < 1. We define t* so that it agrees with 
T on variables in S. For a variable Xi of form Mmin in F, policy t* chooses a variable of Pi{x) that 
was already in F and which caused x* to be added to F in step 01 So, for any fixed policy a for 
the max player, every variable xt in F depends (directly or indirectly) in the PPS x = Pa,T* (x) on 
a variable Xj in F with Pj{l) < 1 or of form Q. So every variable in F satisfies one of two out of 
the three conditions in Lemma l5.II part (ii), with respect to x = P(j^t*{x). Now consider a variable 
Xi in S. T is LDF, so for the fixed policy a for the max player, there is a path in the dependency 
graph of X = P^^rix) from Xj to an Xj which satisfies one of the three conditions in Lemma [5.II part 
(ii). If this path does not contain any variable in F, then it is also a path in the dependency graph 
of X = Pa,T*ix). If it does, then x* depends on a variable in F, so by transitivity of dependence, it 
also depends on a variable which satisfies one of the conditions in Lemma [5.11 (iii. The policy a for 
the max player was chosen arbitrarily. So r* is LDF. 

Next we need to show that (g*.,-*)^ = 0- Since for all variables x, in F, Pi{x) does not contain 
a constant term, we have (P*,t*(0))f = 0- Note that all variables in F of type L, Mmin and Mmax 
depend directly only on variables of T in x = Pj,^r*{x), and every variable of type Q depends on 
some variable in F (otherwise it would have been added to S in the previous step 2). It follows 
then by an easy induction that (P^^^*(0))ir = 0 for all k. So {qtr*)F = 0. 

Since t* is an LDF policy, Lemma [97Tl!l2l i tells us that (/*.,_* > g*. Since g* > 0, we have that 
gp = 0as required. 

Finally, we need to show that g^ > 0, and specify a policy a for the max player that ensures 
this. We will only specify the policy for the Mmax nodes in S; the choice for the other Mmax nodes 
does not matter and can be arbitrary. To show the claim, we need to show inductively that when 
we add a variable Xj to S, if all variables Xj already in S have > 0 (and {g*^,)j > 0), then g* > 0 
(and > 0). 

For the basis case, note that for the variables Xj added to S in step 1, Pi{x) contains a positive 
constant term, hence g* > Pi{0) > 0. Consider now the variables Xi added in an execution of 
step 2. If Pi{x) is of form L then it contains an xj that was added earlier to S'; hence > 0 
(and {g*^Jj > 0), and thus g* = Piig*) > 0 (and {g*^^)i > 0). If x* = XjXk or x* = mm{xj,Xk} 
for some Xj,Xk, then both Xj,Xk were added earlier to S; hence gj>0 and 5^^ > 0, and thus 
g* = Pi{g*) > 0 (and similarly, (5 **)* >0). If x^ = max{xj,Xfc}, then at least one of Xj,Xk was 
added earlier to S, say Xj, hence gj>0 and {g%^^)j > 0. Let the policy a choose a{xi) = Xj] then 
9* > {9*,Ji = > 0. 

Consider now the set R of variables added to S in an execution of step 6 and assume inductively 
that all the variables Xj assigned so far to S have gj>0 (and {g^^)j > 0). Since the variables 
Xi of R were not added to F in steps 3-4, they all satisfy Pi{l) = 1, they are not of type Q, every 
variable of type L or Mmin does not depend directly on any variable in F, and every variable of type 
Mmax depends directly on at least one variable that is not in F. Let the policy a choose actions for 
variables in S as before, and for each variable Xj of type Mmax in R let a choose an arbitrary variable 
of Pi{x) that is not in F. (For the variables Xi of type Mmax that are in F, the choices of a do not 
matter at this point.) Then the dependency graph of x = Pa,*ix) has no edges from R to F. 

We claim that (5 ^*)^ > Let t' be an LDF policy for the min player in the minPPS, 
X = Pa,*{x), such that <7^* = lar' know r' exists by Lemma f7.2l!fT]l i. Let U be the set of 
variables Xi R with {q*p)i = 0. We need to show that U is empty. Consider any Xj € U. We 
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claim that any variable Xk appearing in {Pfjy(x))j is in U. We know that the dependency graph 
of X = Pa,*{x) has no edges from R to F, so Xk ^ F- It remains to show that = 0, since 

then by inductive assumption Xk ^ S, and so we must have Xk € R, and thus Xk G U. If Xj is of 
type M, then {P^y{x))j = Xk, so (q*y)k = iQa,T')j = 0- > 0 then 

{q*^,)j > 0 so we must have {q*^,)k = 0. Since Xj G U, it can not have type Q since such variables 
not in S were put in F. So Xk G U. 

We have that variables in U depend in the PPS x = Pfjy{x) only on other variables in U. 
However, no variable Xj that satisfied one of the three conditions of Lemma O (ii) is in [/ F R 
since it would have been put in S' or F in an earlier step. Since x = P(jy{x) is an LDF-PPS, 
for any Xi, by Lemma l5.ll there is a path from Xi to such an Xj. If Xi G U, then this path must 
remain entirely in U which is a contradiction. Therefore U is empty and we have that > 0 

as required. 

The fact that the algorithm runs in P-time follows easily from the fact that each iteration of the 
outer loop adds at least one element to S, and no element is ever removed. The individual steps 
of the algorithm are each easily computable in P-time, by performing AND-OR reachability on the 
dependency graph. □ 

We remark that the policy r* for the min player constructed in the proof of Theorem 19.31 does 
not necessarily ensure value 0 in the GFP for a variable Xi with g* = 0 (i.e., it is possible that 
ig*T*)i >0)- III fact, there may not exist any such policy (deterministic or randomized) ensuring 
value 0 for the min player in a max-minPPS (or even a minPPS). Similarly, in a BSSG (or even 
RMDP) with optimal non-reachability value 0 (i.e. reachability value 1), there may not exist any 
optimal static strategy for the player that wants to minimize the non-reachability probability; recall 
Example 13.11 We show however that we can construct a non-static optimal deterministic strategy. 

Theorem 9.4. There is a non-static deterministic optimal strategy for the player minimizing the 
probability o/not reaching a target type in a BSSG, if the value of not reaching the target is 0. 

Proof. Let x = P{x) be the max-minPPS for the given BSSG, whose GFP g* gives the non¬ 
reachability values. Let Z = {xi\g* = 0} be the final value of the set F that is returned by the 
algorithm of Fig. 1. Let r* be the LDF policy for player min constructed in the proof of Theorem 
19.31 that has the property that g* = 0 iff (g* .,_»)* = 0. Recall that t* selects for each type Mmin 
variable Xj G Z a variable Xj of Pi{x) that was added earlier to F (and hence is also in Z). From 
Proposition 14.11 we can also compute in P-time an LDF policy r with g*< 1. We combine r* and 
r in the following non-static policy: 

We designate one member of our initial population with type in Z to be the queen. The rest 
of the population are workers. We use policy t* for the queen and r for the workers. In following 
generations, if we have not reached an object of the target type, we choose one of the children in 
Z of the last generation’s queen (which we next show must exist) to be the new queen. Again, all 
other members of the population are workers. 

We first show the policy is well defined, i.e., we can always find a new queen as prescribed. If 
g* = 0, then Pi{g*) = {P*y{g*))i = gl =0. If T)(x) has form L then all Xj appearing in Pi{x) have 
gj=0 and there is no constant term. If Pj(x) has form Q then at least one Xj in Pj(x) will have 
g* = 0. If Pi{x) has form Mmin, then the Xj = T*{xi) in {P^.y{x))i has g* = 0. Finally, if Pi{x) has 
form Mmax, then for all variables Xj in Pj,y{x) we have g* = 0. In other words, using r*, an object 
of a type in Z has offspring which either includes the target or an object of a type in Z. Thus the 
next generation always includes a potential choice of queen. 
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Next we show that if we never reach the target type, the queen has more than one child infinitely 
often with probability 1. Indeed we claim that with probability at least within the next n 

steps, either the queen has more than one child or we reach the target. For this purpose, we define 
inductively for every variable x* € Z a (directed) tree Tj with root Xj, which shows why Xi was 
added to F in the final iteration of the algorithm. If Pj(l) < 1 or Xj has type Q then Tj is a single 
node labeled Xj. If Xj has type L (respectively Mmin) and was added in step 4 because of variable 
Xj € Pi{x) that was already in F (resp., where Xj = T*(xi)), then Tj consists of the edge Xj —>■ xj 
and the subtree Tj rooted at Xj . If x* has type Mmax then Tj contains edges Xi —>■ Xj for all Xj G Pi (x) 
and a subtree Tj hanging from each Xj. 

Suppose that in some step the queen is an object corresponding to Xj G Z. Then with positive 
probability (in fact probability at least in the next (at most) n steps, the process will follow 

a root-to-leaf path of the tree Ti, regardless of the strategy of the max player: whenever the path 
is at a node of type L, the process follows the edge to the (unique) child (which becomes the new 
queen) with the probability of the corresponding transition of the BSSG; when it is at a node of 
type Minin, it follows necessarily the edge to its child because we are using policy r* for the queen; 
and when it is at a node of type Mmax) it follows an edge selected by the max player. Thus, with 
probability at least 2“ 1^1, the process arrives at a leaf of Tj. If the leaf corresponds to a variable Xj 
with Tj(l) < 1 then the process has reached a target type. If the leaf corresponds to a variable of 
type Q then the queen generates two children. 

Thus, if the queen never reaches the target throughout the process, then the queen will generate 
more than one child infinitely often with probability 1. 

By our choice of the policy r followed by the workers, < 1. The descendants of a worker 
of type Xj have positive probability (1 — gl ^)i > 0 of reaching the target regardless of the strategy 
of the max player (this probability is > by Lemma 3.20 of |12| applied to the maxPPS 

X = Pt^^rix))- For each worker descended from the queen these probabilities are independent. So 
with probability 1, one of them will have descendants that reach the target. Thus we reach the 
target with probability 1. □ 

10 Approximating the value of BSSGs and the GFP of max-minPPSs 

In this section we build on the prior results to show that the value of a Branching Simple Stochastic 
Game (BSSG) with reachability as the objective can be approximated in TFNP. Equivalently, we 
show that the GFP, g*, of a max-minPPS can be approximated in TFNP. 

We will first show that we can test in polynomial time whether a (deterministic) policy r for 
the min player in a max-minPPS is LDF. Recall, from Section [H what it means for a policy r for 
the min player to be LDF. 

We borrow the concept of a closed set, studied in [7], which we adapt for maxPPSs as follows: 

Definition 10.1. A closed set of a maxPPS, x = P{x), is a subset of variables S such that: 
(i) the dependency subgraph induced by S is nontrivial (i.e., contains at least one edge, but not 
necessarily more than one variable), and is strongly connected (i.e., every variable in S depends 
on every variable in S via a directed path going only through variables in S); (ii) S contains only 
variables of type M and L; and (Hi) for all variables Xj in S of type L, P{x)i contains only variables 
in S, and furthermore Tj(0) = 0 and Tj(l) = 1. 
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Lemma 10.2. A policy r for the min player is LDF if and only if the maxPPS x = P^^^rix) contains 
no closed sets. 

Proof. To prove the (=^) direction, suppose that S' is a closed set. Since S is strongly-connected, 
every variable Xi of form Mmax must have a choice in S, because otherwise no variable in S would 
depend on it via S. Let u be a policy that picks a choice in S for every variables Xi in S. No 
variable in S in x = P^^rix) depends on any variable outside of S. So there must be a bottom SCC, 
T, of X = Pa^rix) with T C S. T is a linear degenerate SCC, since it contains no variables of form 
Q and has (Po-^r)T(O) = 0 and (Po-,t)t(1) = 1- So r is not LDF. 

To prove the (<i=) direction, suppose that r is a policy for min in x = P{x) which is not LDF, 
i.e., there exists a a such that there is a bottom SCC S' of x = P^^rix) that is linear degenerate. 
We claim that S is a closed set in x = P^^r{x). It is strongly-connected because it is an SCC of 
X = Pa Ax)- It contains no variables of form Q and every variable of form L satisfies (iii). □ 

Lemma 10.3. Given a max-minPPS x = P{x) and a policy r for the min player, we can determine 
in linear time whether r is LDF. 

Proof. By Lemma Il0.2[ r is LDF if the maxPPS x = P^^Ax) contains no closed sets. A P-time 
algorithm was given in j7| to find the maximal closed subsets (called the closed components) of a 
finite state MDP (and an improved algorithm was given in [5]). These algorithms could be readily 
adapted to our setting, in order to compute all maximal closed subsets of the maxPPS. However, 
our problem is simpler here: we only need to determine if there is any closed set. We can test this 
condition more directly as follows. Let G be the dependency graph of the maxPPS x = P^^r{x). 
Note that the variables Xj of type Min have become now type L variables in the maxPPS x = P^^Ax) 
and the corresponding polynomial {P*Ai{x) satisfies (P*^T-)i(0) = 0 and (P*^T-)i(l) = 1- 

Perform AND-OR reachability on G, where the set T of target nodes includes all nodes (vari¬ 
ables) of type Q and all nodes x* of type L where Pi{0) > 0 or P’i(l) < 1; the set of OR nodes 
consists of all type L variables Xi where Pi{0) = 0 and Pi{l) = 1 (this includes all type Min nodes); 
and the set of AND nodes consists of all type Max variables. Recall, from the second paragraph of 
Section 01 the definition of the set of nodes that can and-or reach the set T. Let U be the set of 
nodes that cannot AND-OR reach the set T of target nodes. We claim that r is LDF if and only if 
U is empty. 

Suppose first that r is not LDF. Then the maxPPS x = P^^Ax) contains a closed set S. By 
the dehnition of a closed set, every type Max node of S has a successor in S (because S is strongly 
connected), and every type L node Xj of S has all its successors in S and satisfies P’i(O) = 0 and 
Pj(l) = 1. Therefore, when we perform AND-OR reachability, no node of S will be accessed, i.e., 
no node of S can AND-OR reach the set T of target nodes. Hence S <FU and thus U is not empty. 

On the other hand, suppose that U is not empty, and let S' be a bottom SCC of the subgraph 
G[U] of G induced by U. Then S' satisfies the conditions of a closed set. Hence r is not LDF. □ 

We can show now the main result of this section. 

Theorem 10.4. The problem of approximating the GFP of a max-minPPS x = P{x), i.e. comput¬ 
ing a vector g G [0,1]” such that ||g'* — g\\oQ < e, is in TFNP. 

Proof. We first compute in polynomial time, by Proposition 14.11 the set of indices D = {i ^ [n] \ 
g* = 1}. We then eliminate all variables Xi such that i ^ D from the max-minPPS, substituting 
them by the value 1 and removing their corresponding equations. 
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So, assume henceforth that g* < 1. By Lemma [9.11 for any deterministic LDF policy r for min, 
and deterministic policy a for max, we have that < g* < Furthermore, by Lemma [9.II and 
Corollary 13.31 there exist such policies which make both of these inequalities tight. Thus, to put 
the problem of approximating g* in TFNP, it suffices to guess such policies with g* ^ and ql ^ close 
enough to each other, approximate the two vectors, and verify that they are close. 

In more detail, the algorithm is as follows. Guess deterministic policies a and r for the max 
and min players. Check whether r is LDF (in P-time, by Lemma Il0.3p . If it is not, we reject this 
guess. Otherwise, using our P-time algorithm for computing g* for minPPSs, together with the 
P-time algorithm for computing q* for maxPPSs from |12) . we compute approximations and Vr 
to g*^ and ql^ from below, such that ||uo- — 5^*1100 < e/2 and ||ut- — (7*t-||oo < e/2. Check whether 
\\Vfj — Urlloo < e/2. If so, then output g = otherwise, we reject this guess. 

We have to show that the algorithm is sound and complete: (1) There is at least one guess 
for which the algorithm produces an output, and (2) For every guess a, r for which the algorithm 
produces an output, the output g = is within e of g*. 

For claim (1), consider a deterministic policy a for the max player and deterministic LDF policy 
r for the min player such that g% ^ = g* = qt t- The algorithm computes values < g*^= g* and 
Vt < qt T — 9* such that ||uo- — (7*||oo < e/2 and \\v-r — 5*||oo < e/2. Therefore, ||uo- — Ur||cxD < e/2, 
the algorithm accepts the guess and outputs g = v^- 

For claim (2), suppose that the algorithm accepts a guess a,T and outputs g = v^- Since 
Va < Oa,* < 9* < q*,T, we have: 


Wg* - Vo 


< 

II * 

11^*,r 

< 

11 * 

11^*,r 

< 

e e 

2 2 


-^<7 ||00 

^r||oo T Vq 


= e. 


□ 
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